This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
9
ARMISelLowering.cpp
-
ARMSubtarget.cpp
24
Thumb1FrameLowering.cpp
3
Thumb1RegisterInfo.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
fourParametersTailCall_v6m.ll
1
threeParametersTailCall_v6m.ll
2
twoParametersTailCall_v6m.ll

Differential D7005

Add tail call optimization for thumb1-only targets rev. 3
Needs ReviewPublic

Authored by Langohr on Jan 15 2015, 12:22 PM.

Download Raw Diff

Details

Reviewers

compnerd
jroelofs

Summary

For Tail calls identified during DAG generation, the target address will
be loaded into a register by use of the constant pool.
If R3 is used for argument passing, the target address is forced
to hard reg R12 in order to overcome limitations thumb1 register
allocator with respect to the upper registers.

During epilog generation, spill register restoring will be done within
the emit epilogue function. Three different cases are to be distinguished.

If LR is not pushed on the stack. Then simply a BX is generated.
If LR is pushed on the stack and R3 is available as scratch, LR is restored after pop { ... } for the remaining callee saved regs.
If R3 is not available for LR restore, LR is restored before pop { ... } and the stack pointer is re-adjusted afterwards
If all regs R0...R3 are used for function call parameters, The target address will be copied to R12.

For a cortex M0 I did count that the sequence 2) will take one cycle longer than a version based on BL / pop { ..., pc } without a tail call. Option 3 will be 2 cycles slower than a version without a tail call (additional SP+=4) and option 4 will be 3 cycles slower. Also, 2) and 3) and 4) will generate sligthly larger code (have a look at the test cases).

In discussions on llvm-dev some did argue that for this reason, tail call optimization should not be integrated as part of the default options. In my personal perception the spared precious stack memory is readily worth it.

Diff Detail

Event Timeline

Langohr updated this revision to Diff 18243.Jan 15 2015, 12:22 PM

Langohr retitled this revision from to Add tail call optimization for thumb1-only targets.

Langohr updated this object.

Langohr edited the test plan for this revision. (Show Details)

Langohr added a reviewer: jmolloy.

Langohr set the repository for this revision to rL LLVM.

Langohr added a subscriber: Unknown Object (MLST).

Langohr added a reviewer: compnerd.Jan 15 2015, 12:33 PM

I think you should also add tests for tail-called functions that return things other than void (especially things wider than one register in size).

One more question: do these changes still work on armv4t when an arm function tail calls a thumb function (and vice versa)?

lib/Target/ARM/ARMISelLowering.cpp
1682	as mentioned in the other thread, these variable names feel quite long.
1688	I think the usual "llvm style" here is to put the &&'s at the end of the line, rather than at the beginning, and only break the line where it would go over 80 cols. Might be worthwhile to clang-format the patch.
1712	unncessary whitespace
1713	ditto
lib/Target/ARM/Thumb1FrameLowering.cpp
420	would be better now to combine this line with the one below it.
568	Shorter name suggestion: `MustRestoreR4`.
580	This variable is never assigned to again.... Should pick one name for this, and stick with it, delete the other one.
587	A shorter name suggestion: `LRRestoreReg`.
596	Need a `RegToUseForLRRestore = ARM::R4;` here, otherwise it is used uninitialized later.
617	I think a better name for this would be "EmptyPop", with all of it's defs & uses inverted.
666	Get rid of the variable `IsTailCallReturn` and just check the condition here.
684	changes here do nothing... would be better to revert them. same with the added newlines.
test/CodeGen/ARM/twoParametersTailCall_v6m.ll
20	I think all of the attributes and debug information can be dropped. Most tests put the `CHECK:` lines inbetween the `ret` and the '}' at the end of the function.

jroelofs added a reviewer: jroelofs.Jan 16 2015, 10:22 AM

jroelofs removed a subscriber: jroelofs.

Langohr updated this revision to Diff 18316.Jan 16 2015, 1:23 PM

Langohr retitled this revision from Add tail call optimization for thumb1-only targets to Add tail call optimization for thumb1-only targets rev. 3.

Langohr updated this object.

I think you should also add tests for tail-called functions that return things other than void (especially things
wider than one register in size).

OK. Is done.

One more question: do these changes still work on armv4t when an arm function tail calls a thumb function (and vice versa)?

Tail calls occuring in ARM code will not be affected by the change.
Tail calls occuring in THUMB code will use BX for the call so that from my understanding, the mode switch will happen according to bit #0. As much as I know, when resolving the jump target relocation for function entry point lables, bit #0 will be correctly set for thumb functions. I will double check that next week.

Langohr edited the test plan for this revision. (Show Details)Jan 16 2015, 2:13 PM

I've looked at the code of the register scavenger for thumb1.

There definitely *is* an issue. The register scavenger takes R12 without any further checks of usage.

We will need to implement some change in the scavenger to be on the safe side. Using the stack for scavenging will prove difficult, when considering possible alloca() uses.

Scavenging will never be necessary within the epilogue code, if the ldr rX, mov R12, rx sequence shows up just before the epilogue, we will not be having a problem. Therefore, One might look for a way for forcing the mov R12,rx to be the very last instruction before epilogue generation.

As minimum, I'd add checks for usage of R12 in the target scavenger in order to run into an assert instead of silently generating bad code.

The other option, I am seeing is to make the register scavenger use LR in case that LR is in the CSI list. LR will be pushed and poped as soon as any GPR is spilled and restored. In case of load address loading to R12 for tail calls, LR is always pushed. In this case, we may readily use LR instead of R12 for scavenging. The only issue that I may imagine for this approach is a possible use of __builtin_return_address that might try to get the address from LR and interfere with scavenging.

Added check for use of R12 in register scavenging. Try to fall-back to LR in case that R12 is used in the basic block where the emergency condition occurs.

When re-analyzing prologue and epilogue generation code for thumb1, it seems to me that there is another issue related to the fix done in r210889. The code in head still refers to DPR registers within a possibly available VFP unit and corresponding register spilling and restoring. For thumb1, DPR registers are, however, inaccessible. Prologue and epilogue, thus, should only consider AAPCS calling convention and not AAPCS-VFP. I doubt that this part of the code had shown up as part of copy-and-paste from thumb2/ARM code.?

Similarly, in thumb1 code, I find expressions like STI.isTargetMachO(). In my understanding MachO is only relevant for iOS systems. Is there really any iOS system requiring thumb1 CPUs?

I'd also like to see a test case where r7 is used for the scratch reg.

lib/Target/ARM/ARMISelLowering.cpp
1694	Control flow would be easier to read if you eliminate IsCallAddrRequiredInReg and just do the checks needed here.
test/CodeGen/ARM/twoParametersTailCall_v6m.ll
7	Now that you've dropped the metadata, you can also drop the #[0-9]'s

Added test case for use of R7 for LR restore. (This happens if fold SP update into push-pop adds a push r7).

Added stack pointer update similar to v4t epilogue code.

Added code for using LR instead of R12 as scavenge register in case that R12 happens to be liveIn the basic block.

This looks good, but I don't feel comfortable signing off on it. @t.p.northover, any thoughts on this?

test/CodeGen/ARM/threeParametersTailCall_v6m.ll
28	I think there's only supposed to be one of these per file. Same for the triple.

Hello Jon,

This looks good, but I don't feel comfortable signing off on it. @t.p.northover, any thoughts on this?

Thank you for reviewing.

I agree. With a change like this, one should better be on the safe side and have more than one review.
I also would delay commiting until I have finished some regression tests on real-world hardware. This week, I'll not be finding the time to do it but I hopefully will do next week. As mentioned, I have test code that extensively uses continuation-style coding with many tail calls.
Also as a second requirement for a commit, I will have to provide fixes for the 4 LIT regression tests that will trigger false failures because of the changed epilogue patterns. I will remove the superflous two lines within this change.

Id say merge all the test cases into a single file (thumb1-tail-call.ll ?).

lib/Target/ARM/ARMISelLowering.cpp
1677	Unnecessary braces
1683	Unnecessary whitespace before the last ). Why not collapse this to: bool ForceCallAddrToIP = isTailCall && (RegsToPass.size() >= 4 && Subtarget->isThumb1Only());
1694	Unnecessary whitespace in condition.
1805	Unnecessary whitespace between getRegister and (.
lib/Target/ARM/Thumb1FrameLowering.cpp
335	Why not collapse this to: bool IsTailCallReturn = MBBI->getOpcode() == ARM::TCRETURNri;
361	I don't think that the parenthesis are needed and they detract from readability. I would swap the order since the negation check would short-circuit.
403	This should be on the same line as the brace. Can we perhaps fold the knowledge that the fold should be avoided in tail calls into tryFoldSPUpdateIntoPushPop? At the very least, I think the condition should be: else if (IsTailCallReturn \|\| !tryFoldSPUpdateIntoPushPop(...))
413	Use a switch rather than the if cases?
428	Any reason to not use '='? That is the general style in LLVM.
441	Unnecessary braces.
447	Unnecessary space after the !.
451	Unnecessary space after assert. I would also drop the >= 0.
466	Space after the comma.
468	Unnecessary space before the ++. Actually, switch to the prefixed form as that is preferred in LLVM code style.
511	Perhaps const auto *RI would help reduce the wrapping?
527	Unnecessary braces
539	unnecessary space after assert.
542	The next line should fit on this line. Why the unnecessary dl variable declaration above? Its a single use, why not inline it here.
656	Unnecessary empty line.
660	Seems like this could be written as: if (MBB.getLastNonDebugInstr()->getOpcode() == ARM::TCRETURNri) return true;
lib/Target/ARM/Thumb1RegisterInfo.cpp
425	The text feels weird. Reflow the text?
431	Unnecessary braces.
436	Is ScratchReg a better name perhaps?

In D7005#110256, @Langohr wrote:

When re-analyzing prologue and epilogue generation code for thumb1, it seems to me that there is another issue related to the fix done in r210889. The code in head still refers to DPR registers within a possibly available VFP unit and corresponding register spilling and restoring. For thumb1, DPR registers are, however, inaccessible. Prologue and epilogue, thus, should only consider AAPCS calling convention and not AAPCS-VFP. I doubt that this part of the code had shown up as part of copy-and-paste from thumb2/ARM code.?

It's required to support ARM/Thumb interworking when using SjLj exceptions.

jmolloy removed a reviewer: jmolloy.Mar 18 2015, 3:25 AM

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

44 lines

ARMSubtarget.cpp

2 lines

Thumb1FrameLowering.cpp

192 lines

Thumb1RegisterInfo.cpp

33 lines

test/

CodeGen/

ARM/

fourParametersTailCall_v6m.ll

35 lines

threeParametersTailCall_v6m.ll

47 lines

twoParametersTailCall_v6m.ll

20 lines

Diff 18402

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,665 Lines • ▼ Show 20 Lines	if (isTailCall) {
for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {		for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {
Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first,		Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first,
RegsToPass[i].second, InFlag);		RegsToPass[i].second, InFlag);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
}		}
InFlag = SDValue();		InFlag = SDValue();
}		}

		// For thumb1 targets, if R3 is used for argument passing, we need
		// to place the call target address in IP (i.e. R12).
		bool IsR3UsedForArgumentPassing = false;
		if (RegsToPass.size() >= 4) {
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary braces compnerd: Unnecessary braces
		IsR3UsedForArgumentPassing = true;
		}

		bool ForceCallAddrToRegR12 = false;

		jroelofsUnsubmitted Not Done Reply Inline Actions as mentioned in the other thread, these variable names feel quite long. jroelofs: as mentioned in the other thread, these variable names feel quite long.
		if (isTailCall && IsR3UsedForArgumentPassing && Subtarget->isThumb1Only() )
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary whitespace before the last ). Why not collapse this to: bool ForceCallAddrToIP = isTailCall && (RegsToPass.size() >= 4 && Subtarget->isThumb1Only()); compnerd: Unnecessary whitespace before the last ). Why not collapse this to: bool…
		ForceCallAddrToRegR12 = true;

// If the callee is a GlobalAddress/ExternalSymbol node (quite common, every		// If the callee is a GlobalAddress/ExternalSymbol node (quite common, every
// direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol		// direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol
// node so that legalize doesn't hack it.		// node so that legalize doesn't hack it.
		jroelofsUnsubmitted Not Done Reply Inline Actions I think the usual "llvm style" here is to put the &&'s at the end of the line, rather than at the beginning, and only break the line where it would go over 80 cols. Might be worthwhile to clang-format the patch. jroelofs: I think the usual "llvm style" here is to put the &&'s at the end of the line, rather than at…
bool isDirect = false;		bool isDirect = false;
bool isARMFunc = false;		bool isARMFunc = false;
bool isLocalARMFunc = false;		bool isLocalARMFunc = false;
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();

if (EnableARMLongCalls) {		if (EnableARMLongCalls \|\| (isTailCall && Subtarget->isThumb1Only() )) {
		jroelofsUnsubmitted Not Done Reply Inline Actions Control flow would be easier to read if you eliminate IsCallAddrRequiredInReg and just do the checks needed here. jroelofs: Control flow would be easier to read if you eliminate IsCallAddrRequiredInReg and just do the…
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary whitespace in condition. compnerd: Unnecessary whitespace in condition.
assert((Subtarget->isTargetWindows() \|\|		assert((Subtarget->isTargetWindows() \|\|
		(isTailCall && Subtarget->isThumb1Only()) \|\|
getTargetMachine().getRelocationModel() == Reloc::Static) &&		getTargetMachine().getRelocationModel() == Reloc::Static) &&
"long-calls with non-static relocation model!");		"long-calls with non-static relocation model!");

// Handle a global address or an external symbol. If it's not one of		// Handle a global address or an external symbol. If it's not one of
// those, the target's already in a register, so we don't need to do		// those, the target's already in a register, so we don't need to do
// anything extra.		// anything extra.
if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {		if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {
const GlobalValue *GV = G->getGlobal();		const GlobalValue *GV = G->getGlobal();
// Create a constant pool entry for the callee address		// Create a constant pool entry for the callee address
unsigned ARMPCLabelIndex = AFI->createPICLabelUId();		unsigned ARMPCLabelIndex = AFI->createPICLabelUId();
ARMConstantPoolValue *CPV =		ARMConstantPoolValue *CPV =
ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, 0);		ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, 0);

// Get the address of the callee into a register		// Get the address of the callee into a register
SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4);		SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4);
CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr);		CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr);
		jroelofsUnsubmitted Not Done Reply Inline Actions unncessary whitespace jroelofs: unncessary whitespace
Callee = DAG.getLoad(getPointerTy(), dl,		Callee = DAG.getLoad(getPointerTy(), dl,
		jroelofsUnsubmitted Not Done Reply Inline Actions ditto jroelofs: ditto
DAG.getEntryNode(), CPAddr,		DAG.getEntryNode(), CPAddr,
MachinePointerInfo::getConstantPool(),		MachinePointerInfo::getConstantPool(),
false, false, false, 0);		false, false, false, 0);
} else if (ExternalSymbolSDNode *S=dyn_cast<ExternalSymbolSDNode>(Callee)) {		} else if (ExternalSymbolSDNode *S=dyn_cast<ExternalSymbolSDNode>(Callee)) {
const char *Sym = S->getSymbol();		const char *Sym = S->getSymbol();

// Create a constant pool entry for the callee address		// Create a constant pool entry for the callee address
unsigned ARMPCLabelIndex = AFI->createPICLabelUId();		unsigned ARMPCLabelIndex = AFI->createPICLabelUId();
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (isARMFunc && Subtarget->isThumb1Only() && !Subtarget->hasV5TOps()) {
// On ELF targets for PIC code, direct calls should go through the PLT		// On ELF targets for PIC code, direct calls should go through the PLT
if (Subtarget->isTargetELF() &&		if (Subtarget->isTargetELF() &&
getTargetMachine().getRelocationModel() == Reloc::PIC_)		getTargetMachine().getRelocationModel() == Reloc::PIC_)
OpFlags = ARMII::MO_PLT;		OpFlags = ARMII::MO_PLT;
Callee = DAG.getTargetExternalSymbol(Sym, getPointerTy(), OpFlags);		Callee = DAG.getTargetExternalSymbol(Sym, getPointerTy(), OpFlags);
}		}
}		}

		if (ForceCallAddrToRegR12) {
		Chain = DAG.getCopyToReg(Chain, dl, ARM::R12,
		Callee,Chain.getValue(1));
		Callee = DAG.getRegister (ARM::R12,getPointerTy());
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary whitespace between getRegister and (. compnerd: Unnecessary whitespace between getRegister and (.
		}

// FIXME: handle tail calls differently.		// FIXME: handle tail calls differently.
unsigned CallOpc;		unsigned CallOpc;
bool HasMinSizeAttr = MF.getFunction()->getAttributes().hasAttribute(		bool HasMinSizeAttr = MF.getFunction()->getAttributes().hasAttribute(
AttributeSet::FunctionIndex, Attribute::MinSize);		AttributeSet::FunctionIndex, Attribute::MinSize);
if (Subtarget->isThumb()) {		if (Subtarget->isThumb()) {
if ((!isDirect \|\| isARMFunc) && !Subtarget->hasV5TOps())		if ((!isDirect \|\| isARMFunc) && !Subtarget->hasV5TOps())
CallOpc = ARMISD::CALL_NOLINK;		CallOpc = ARMISD::CALL_NOLINK;
else		else
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	ARMTargetLowering::IsEligibleForTailCallOptimization(SDValue Callee,
if (CallerF->hasFnAttribute("interrupt"))		if (CallerF->hasFnAttribute("interrupt"))
return false;		return false;

// Also avoid sibcall optimization if either caller or callee uses struct		// Also avoid sibcall optimization if either caller or callee uses struct
// return semantics.		// return semantics.
if (isCalleeStructRet \|\| isCallerStructRet)		if (isCalleeStructRet \|\| isCallerStructRet)
return false;		return false;

// FIXME: Completely disable sibcall for Thumb1 since Thumb1RegisterInfo::
// emitEpilogue is not ready for them. Thumb tail calls also use t2B, as
// the Thumb1 16-bit unconditional branch doesn't have sufficient relocation
// support in the assembler and linker to be used. This would need to be
// fixed to fully support tail calls in Thumb1.
//
// Doing this is tricky, since the LDM/POP instruction on Thumb doesn't take
// LR. This means if we need to reload LR, it takes an extra instructions,
// which outweighs the value of the tail call; but here we don't know yet
// whether LR is going to be used. Probably the right approach is to
// generate the tail call here and turn it back into CALL/RET in
// emitEpilogue if LR is used.

// Thumb1 PIC calls to external symbols use BX, so they can be tail calls,
// but we need to make sure there are enough registers; the only valid
// registers are the 4 used for parameters. We don't currently do this
// case.
if (Subtarget->isThumb1Only())
return false;

// Externally-defined functions with weak linkage should not be		// Externally-defined functions with weak linkage should not be
// tail-called on ARM when the OS does not support dynamic		// tail-called on ARM when the OS does not support dynamic
// pre-emption of symbols, as the AAELF spec requires normal calls		// pre-emption of symbols, as the AAELF spec requires normal calls
// to undefined weak functions to be replaced with a NOP or jump to the		// to undefined weak functions to be replaced with a NOP or jump to the
// next instruction. The behaviour of branch instructions in this		// next instruction. The behaviour of branch instructions in this
// situation (as used for tail calls) is implementation-defined, so we		// situation (as used for tail calls) is implementation-defined, so we
// cannot rely on the linker replacing the tail call with a return.		// cannot rely on the linker replacing the tail call with a return.
if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {		if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {
▲ Show 20 Lines • Show All 329 Lines • ▼ Show 20 Lines

bool ARMTargetLowering::mayBeEmittedAsTailCall(CallInst *CI) const {		bool ARMTargetLowering::mayBeEmittedAsTailCall(CallInst *CI) const {
if (!Subtarget->supportsTailCall())		if (!Subtarget->supportsTailCall())
return false;		return false;

if (!CI->isTailCall() \|\| getTargetMachine().Options.DisableTailCalls)		if (!CI->isTailCall() \|\| getTargetMachine().Options.DisableTailCalls)
return false;		return false;

return !Subtarget->isThumb1Only();		return true;
}		}

// ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as		// ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as
// their target counterpart wrapped in the ARMISD::Wrapper node. Suppose N is		// their target counterpart wrapped in the ARMISD::Wrapper node. Suppose N is
// one of the above mentioned nodes. It has to be wrapped because otherwise		// one of the above mentioned nodes. It has to be wrapped because otherwise
// Select(N) returns N. So the raw TargetGlobalAddress nodes, etc. can only		// Select(N) returns N. So the raw TargetGlobalAddress nodes, etc. can only
// be used to form addressing mode. These wrapped nodes will be selected		// be used to form addressing mode. These wrapped nodes will be selected
// into MOVi.		// into MOVi.
▲ Show 20 Lines • Show All 9,004 Lines • Show Last 20 Lines

lib/Target/ARM/ARMSubtarget.cpp

Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	void ARMSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {

UseMovt = hasV6T2Ops() && ArmUseMOVT;		UseMovt = hasV6T2Ops() && ArmUseMOVT;

if (isTargetMachO()) {		if (isTargetMachO()) {
IsR9Reserved = ReserveR9 \|\| !HasV6Ops;		IsR9Reserved = ReserveR9 \|\| !HasV6Ops;
SupportsTailCall = !isTargetIOS() \|\| !getTargetTriple().isOSVersionLT(5, 0);		SupportsTailCall = !isTargetIOS() \|\| !getTargetTriple().isOSVersionLT(5, 0);
} else {		} else {
IsR9Reserved = ReserveR9;		IsR9Reserved = ReserveR9;
SupportsTailCall = !isThumb1Only();		SupportsTailCall = true;
}		}

if (Align == DefaultAlign) {		if (Align == DefaultAlign) {
// Assume pre-ARMv6 doesn't support unaligned accesses.		// Assume pre-ARMv6 doesn't support unaligned accesses.
//		//
// ARMv6 may or may not support unaligned accesses depending on the		// ARMv6 may or may not support unaligned accesses depending on the
// SCTLR.U bit, which is architecture-specific. We assume ARMv6		// SCTLR.U bit, which is architecture-specific. We assume ARMv6
// Darwin and NetBSD targets support unaligned accesses, and others don't.		// Darwin and NetBSD targets support unaligned accesses, and others don't.
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

lib/Target/ARM/Thumb1FrameLowering.cpp

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	case ARM::LR:
nullptr, MRI->getDwarfRegNum(Reg, true), MFI->getObjectOffset(FI)));		nullptr, MRI->getDwarfRegNum(Reg, true), MFI->getObjectOffset(FI)));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
break;		break;
}		}
}		}


// Adjust FP so it point to the stack slot that contains the previous FP.		// Adjust FP so it point to the stack slot that contains the previous FP.
if (HasFP) {		if (HasFP) {
FramePtrOffsetInBlock += MFI->getObjectOffset(FramePtrSpillFI)		FramePtrOffsetInBlock += MFI->getObjectOffset(FramePtrSpillFI)
+ GPRCS1Size + ArgRegsSaveSize;		+ GPRCS1Size + ArgRegsSaveSize;
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tADDrSPi), FramePtr)		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tADDrSPi), FramePtr)
.addReg(ARM::SP).addImm(FramePtrOffsetInBlock / 4)		.addReg(ARM::SP).addImm(FramePtrOffsetInBlock / 4)
.setMIFlags(MachineInstr::FrameSetup));		.setMIFlags(MachineInstr::FrameSetup));
if(FramePtrOffsetInBlock) {		if(FramePtrOffsetInBlock) {
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	for (int i = 2, e = MI->getNumOperands() - 2; i != e; ++i)
if (!isCalleeSavedRegister(MI->getOperand(i).getReg(), CSRegs))		if (!isCalleeSavedRegister(MI->getOperand(i).getReg(), CSRegs))
return false;		return false;
return true;		return true;
}		}
return false;		return false;
}		}

void Thumb1FrameLowering::emitEpilogue(MachineFunction &MF,		void Thumb1FrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();		MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
assert((MBBI->getOpcode() == ARM::tBX_RET \|\|		assert((MBBI->getOpcode() == ARM::tBX_RET \|\|
MBBI->getOpcode() == ARM::tPOP_RET) &&		MBBI->getOpcode() == ARM::tPOP_RET \|\|
"Can only insert epilog into returning blocks");		MBBI->getOpcode() == ARM::TCRETURNri)
		&& "Can only insert epilog into returning blocks "
		"and tail calls with address in regs.");

		bool IsTailCallReturn = false;
		if (MBBI->getOpcode() == ARM::TCRETURNri)
		IsTailCallReturn = true;
		compnerdUnsubmitted Not Done Reply Inline Actions Why not collapse this to: bool IsTailCallReturn = MBBI->getOpcode() == ARM::TCRETURNri; compnerd: Why not collapse this to: bool IsTailCallReturn = MBBI->getOpcode() == ARM::TCRETURNri;

DebugLoc dl = MBBI->getDebugLoc();		DebugLoc dl = MBBI->getDebugLoc();
MachineFrameInfo *MFI = MF.getFrameInfo();		MachineFrameInfo *MFI = MF.getFrameInfo();
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
const Thumb1RegisterInfo RegInfo = static_cast<const Thumb1RegisterInfo >(		const Thumb1RegisterInfo RegInfo = static_cast<const Thumb1RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
const Thumb1InstrInfo &TII =		const Thumb1InstrInfo &TII =
static_cast<const Thumb1InstrInfo >(MF.getSubtarget().getInstrInfo());		static_cast<const Thumb1InstrInfo >(MF.getSubtarget().getInstrInfo());

unsigned Align = MF.getTarget()		unsigned Align = MF.getTarget()
.getSubtargetImpl()		.getSubtargetImpl()
->getFrameLowering()		->getFrameLowering()
->getStackAlignment();		->getStackAlignment();
unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize(Align);		unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize(Align);
int NumBytes = (int)MFI->getStackSize();		int NumBytes = (int)MFI->getStackSize();
assert((unsigned)NumBytes >= ArgRegsSaveSize &&		assert((unsigned)NumBytes >= ArgRegsSaveSize &&
"ArgRegsSaveSize is included in NumBytes");		"ArgRegsSaveSize is included in NumBytes");
const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs();		const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs();
unsigned FramePtr = RegInfo->getFrameRegister(MF);		unsigned FramePtr = RegInfo->getFrameRegister(MF);

if (!AFI->hasStackFrame()) {		if (!AFI->hasStackFrame()) {
if (NumBytes - ArgRegsSaveSize != 0)		if (NumBytes - ArgRegsSaveSize != 0)
emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, NumBytes - ArgRegsSaveSize);		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, NumBytes - ArgRegsSaveSize);
} else {		} else {
// Unwind MBBI to point to first LDR / VLDRD.		// Unwind MBBI to point to first LDR / VLDRD. Not for tail call returns!
if (MBBI != MBB.begin()) {		if ((MBBI != MBB.begin()) && (!IsTailCallReturn)) {
		compnerdUnsubmitted Not Done Reply Inline Actions I don't think that the parenthesis are needed and they detract from readability. I would swap the order since the negation check would short-circuit. compnerd: I don't think that the parenthesis are needed and they detract from readability. I would swap…
do		do
--MBBI;		--MBBI;
while (MBBI != MBB.begin() && isCSRestore(MBBI, CSRegs));		while (MBBI != MBB.begin() && isCSRestore(MBBI, CSRegs));
if (!isCSRestore(MBBI, CSRegs))		if (!isCSRestore(MBBI, CSRegs))
++MBBI;		++MBBI;
}		}

// Move SP to start of FP callee save spill area.		// Move SP to start of FP callee save spill area.
Show All 21 Lines	if (AFI->shouldRestoreSPFromFP()) {
.addReg(FramePtr));		.addReg(FramePtr));
} else {		} else {
if (MBBI->getOpcode() == ARM::tBX_RET &&		if (MBBI->getOpcode() == ARM::tBX_RET &&
&MBB.front() != MBBI &&		&MBB.front() != MBBI &&
std::prev(MBBI)->getOpcode() == ARM::tPOP) {		std::prev(MBBI)->getOpcode() == ARM::tPOP) {
MachineBasicBlock::iterator PMBBI = std::prev(MBBI);		MachineBasicBlock::iterator PMBBI = std::prev(MBBI);
if (!tryFoldSPUpdateIntoPushPop(STI, MF, PMBBI, NumBytes))		if (!tryFoldSPUpdateIntoPushPop(STI, MF, PMBBI, NumBytes))
emitSPUpdate(MBB, PMBBI, TII, dl, *RegInfo, NumBytes);		emitSPUpdate(MBB, PMBBI, TII, dl, *RegInfo, NumBytes);
} else if (!tryFoldSPUpdateIntoPushPop(STI, MF, MBBI, NumBytes))		} else if (IsTailCallReturn) {
		// Don't try to fold SP update into push pop for tail call returns.
		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, NumBytes);
		}
		else if (!tryFoldSPUpdateIntoPushPop(STI, MF, MBBI, NumBytes))
		compnerdUnsubmitted Not Done Reply Inline Actions This should be on the same line as the brace. Can we perhaps fold the knowledge that the fold should be avoided in tail calls into tryFoldSPUpdateIntoPushPop? At the very least, I think the condition should be: else if (IsTailCallReturn \|\| !tryFoldSPUpdateIntoPushPop(...)) compnerd: This should be on the same line as the brace. Can we perhaps fold the knowledge that the fold…
emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, NumBytes);		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, NumBytes);
}		}
}		}

bool IsV4PopReturn = false;		bool IsR4InCSI = false;
for (const CalleeSavedInfo &CSI : MFI->getCalleeSavedInfo())		bool IsR7InCSI = false;
		bool IsLRInCSI = false;

		for (const CalleeSavedInfo &CSI : MFI->getCalleeSavedInfo()) {
		if (CSI.getReg() == ARM::R4)
		compnerdUnsubmitted Not Done Reply Inline Actions Use a switch rather than the if cases? compnerd: Use a switch rather than the if cases?
		IsR4InCSI = true;
		if (CSI.getReg() == ARM::R7)
		IsR7InCSI = true;
if (CSI.getReg() == ARM::LR)		if (CSI.getReg() == ARM::LR)
IsV4PopReturn = true;		IsLRInCSI = true;
IsV4PopReturn &= STI.hasV4TOps() && !STI.hasV5TOps();		}

		jroelofsUnsubmitted Not Done Reply Inline Actions would be better now to combine this line with the one below it. jroelofs: would be better now to combine this line with the one below it.
		bool IsV4PopReturn = IsLRInCSI && STI.hasV4TOps() && !STI.hasV5TOps();

		if (IsTailCallReturn) {
		MBBI = MBB.getLastNonDebugInstr();

		// First restore callee saved registers. Unlike for normal returns
		// this is not done in restoreCalleeSavedRegisters.
		const std::vector<CalleeSavedInfo> &CSI(MFI->getCalleeSavedInfo());
		compnerdUnsubmitted Not Done Reply Inline Actions Any reason to not use '='? That is the general style in LLVM. compnerd: Any reason to not use '='? That is the general style in LLVM.

		MachineFunction &MF = *MBB.getParent();
		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();

		// We need to additionally push/pop R4 in case that LR reconstruction
		// for tail calls requires R4 as scratch register.
		bool MustRestoreR4 = false;

		bool IsR3AvailableAsSpill = true;

		for (unsigned i = 0, e = MBBI->getNumOperands(); i != e; ++i) {
		MachineOperand &Operand = MBBI->getOperand(i);
		if (Operand.isReg()) {
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary braces. compnerd: Unnecessary braces.
		if (Operand.getReg() == ARM::R3)
		IsR3AvailableAsSpill = false;
		}
		}

		if (IsLRInCSI && ! IsR3AvailableAsSpill) {
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary space after the !. compnerd: Unnecessary space after the !.
		// We need to restore LR before pop
		// and need another scratch register for this purpose
		int StackSlotForSavedLR = CSI.size() - 1;
		assert (StackSlotForSavedLR >= 0 && "Wrong Stack slot for LR.");
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary space after assert. I would also drop the >= 0. compnerd: Unnecessary space after assert. I would also drop the >= 0.

		unsigned LRRestoreReg;

		// Make sure that R4/R7 or R3 may be used as scratch.
		// Arrange for an additional tPUSH (R4) and pop {r4, ...} if necessary.
		if (IsR4InCSI)
		LRRestoreReg = ARM::R4;
		else if (IsR7InCSI)
		LRRestoreReg = ARM::R7;
		else {
		MustRestoreR4 = true;
		LRRestoreReg = ARM::R4;

		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tPUSH)))
		.addReg(ARM::R4,RegState::Kill);
		compnerdUnsubmitted Not Done Reply Inline Actions Space after the comma. compnerd: Space after the comma.

		StackSlotForSavedLR ++;
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary space before the ++. Actually, switch to the prefixed form as that is preferred in LLVM code style. compnerd: Unnecessary space before the ++. Actually, switch to the prefixed form as that is preferred in…
		}

		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tLDRspi))
		.addReg(LRRestoreReg, RegState::Define)
		.addReg(ARM::SP)
		.addImm(StackSlotForSavedLR));

		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
		.addReg(ARM::LR, RegState::Define)
		.addReg(LRRestoreReg, RegState::Kill));
		}

		MachineInstrBuilder MIB = BuildMI(MF, dl, TII.get(ARM::tPOP));
		AddDefaultPred(MIB);

		bool EmptyPop = true;

		if (MustRestoreR4) {
		MIB.addReg(ARM::R4, getDefRegState(true));
		EmptyPop = false;
		}

		for (unsigned i = CSI.size(); i != 0; --i) {
		unsigned Reg = CSI[i-1].getReg();

		if (Reg == ARM::R4 && MustRestoreR4)
		continue;

		if (Reg == ARM::LR)
		continue;

		MIB.addReg(Reg, getDefRegState(true));
		EmptyPop = false;
		}

		// It's illegal to emit pop instruction without operands.
		if (EmptyPop)
		MF.DeleteMachineInstr(MIB);
		else
		MBB.insert(MBBI, &*MIB);

		if (IsLRInCSI) {
		const Thumb1RegisterInfo *RegInfo =
		compnerdUnsubmitted Not Done Reply Inline Actions Perhaps const auto RI would help reduce the wrapping? compnerd:* Perhaps const auto *RI would help reduce the wrapping?
		static_cast<const Thumb1RegisterInfo *>
		(MF.getSubtarget().getRegisterInfo());

		if (IsR3AvailableAsSpill) {
		// Restore LR after pop possible.

		MachineInstrBuilder MIB = BuildMI(MF, dl, TII.get(ARM::tPOP));
		AddDefaultPred(MIB);
		MIB.addReg(ARM::R3, getDefRegState(true));
		MBB.insert(MBBI, &*MIB);

		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
		.addReg(ARM::LR, RegState::Define)
		.addReg(ARM::R3, RegState::Kill));

		if (ArgRegsSaveSize) {
		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary braces compnerd: Unnecessary braces
		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize);
		}

		} else {
		// Re-adjust stack pointer for LR content still residing on the stack.
		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, 4 + ArgRegsSaveSize);
		}
		}

		MachineOperand &JumpTarget = MBBI->getOperand(0);

		assert (MBBI->getOpcode() == ARM::TCRETURNri);
		compnerdUnsubmitted Not Done Reply Inline Actions unnecessary space after assert. compnerd: unnecessary space after assert.
		DebugLoc dl = MBBI->getDebugLoc();

		BuildMI(MBB, MBBI, dl,
		compnerdUnsubmitted Not Done Reply Inline Actions The next line should fit on this line. Why the unnecessary dl variable declaration above? Its a single use, why not inline it here. compnerd: The next line should fit on this line. Why the unnecessary dl variable declaration above? Its…
		TII.get(ARM::tTAILJMPr))
		.addReg(JumpTarget.getReg(), RegState::Kill);

		MachineInstr *NewMI = std::prev(MBBI);
		for (unsigned i = 1, e = MBBI->getNumOperands(); i != e; ++i)
		NewMI->addOperand(MBBI->getOperand(i));

		// Delete the pseudo instruction TCRETURN.
		MBB.erase(MBBI);
		MBBI = NewMI;
		return;
		}

// Unlike T2 and ARM mode, the T1 pop instruction cannot restore		// Unlike T2 and ARM mode, the T1 pop instruction cannot restore
// to LR, and we can't pop the value directly to the PC since		// to LR, and we can't pop the value directly to the PC since
// we need to update the SP after popping the value. So instead		// we need to update the SP after popping the value. So instead
// we have to emit:		// we have to emit:
// POP {r3}		// POP {r3}
// ADD sp, #offset		// ADD sp, #offset
// BX r3		// BX r3
// If this would clobber a return value, then generate this sequence instead:		// If this would clobber a return value, then generate this sequence instead:
// MOV ip, r3		// MOV ip, r3
// POP {r3}		// POP {r3}
// ADD sp, #offset		// ADD sp, #offset
// MOV lr, r3		// MOV lr, r3
// MOV r3, ip		// MOV r3, ip
		jroelofsUnsubmitted Not Done Reply Inline Actions Shorter name suggestion: `MustRestoreR4`. jroelofs: Shorter name suggestion: `MustRestoreR4`.
// BX lr		// BX lr
if (ArgRegsSaveSize \|\| IsV4PopReturn) {		if (ArgRegsSaveSize \|\| IsV4PopReturn) {
// Get the last instruction, tBX_RET		// Get the last instruction, tBX_RET
MBBI = MBB.getLastNonDebugInstr();		MBBI = MBB.getLastNonDebugInstr();
assert (MBBI->getOpcode() == ARM::tBX_RET);		assert (MBBI->getOpcode() == ARM::tBX_RET);
DebugLoc dl = MBBI->getDebugLoc();		DebugLoc dl = MBBI->getDebugLoc();

if (AFI->getReturnRegsCount() <= 3) {		if (AFI->getReturnRegsCount() <= 3) {
// Epilogue: pop saved LR to R3 and branch off it.		// Epilogue: pop saved LR to R3 and branch off it.
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tPOP)))		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tPOP)))
.addReg(ARM::R3, RegState::Define);		.addReg(ARM::R3, RegState::Define);

		jroelofsUnsubmitted Not Done Reply Inline Actions This variable is never assigned to again.... Should pick one name for this, and stick with it, delete the other one. jroelofs: This variable is never assigned to again.... Should pick one name for this, and stick with it…
emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize);		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize);

MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, dl, TII.get(ARM::tBX))		BuildMI(MBB, MBBI, dl, TII.get(ARM::tBX))
.addReg(ARM::R3, RegState::Kill);		.addReg(ARM::R3, RegState::Kill);
AddDefaultPred(MIB);		AddDefaultPred(MIB);
MIB.copyImplicitOps(&*MBBI);		MIB.copyImplicitOps(&*MBBI);
		jroelofsUnsubmitted Not Done Reply Inline Actions A shorter name suggestion: `LRRestoreReg`. jroelofs: A shorter name suggestion: `LRRestoreReg`.
// erase the old tBX_RET instruction		// erase the old tBX_RET instruction
MBB.erase(MBBI);		MBB.erase(MBBI);
} else {		} else {
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
.addReg(ARM::R12, RegState::Define)		.addReg(ARM::R12, RegState::Define)
.addReg(ARM::R3, RegState::Kill));		.addReg(ARM::R3, RegState::Kill));

AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tPOP)))		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tPOP)))
.addReg(ARM::R3, RegState::Define);		.addReg(ARM::R3, RegState::Define);
		jroelofsUnsubmitted Not Done Reply Inline Actions Need a `RegToUseForLRRestore = ARM::R4;` here, otherwise it is used uninitialized later. jroelofs: Need a `RegToUseForLRRestore = ARM::R4;` here, otherwise it is used uninitialized later.

emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize);		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize);

AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
.addReg(ARM::LR, RegState::Define)		.addReg(ARM::LR, RegState::Define)
.addReg(ARM::R3, RegState::Kill));		.addReg(ARM::R3, RegState::Kill));

AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
.addReg(ARM::R3, RegState::Define)		.addReg(ARM::R3, RegState::Define)
.addReg(ARM::R12, RegState::Kill));		.addReg(ARM::R12, RegState::Kill));
// Keep the tBX_RET instruction		// Keep the tBX_RET instruction
}		}
}		}
}		}

bool Thumb1FrameLowering::		bool Thumb1FrameLowering::
spillCalleeSavedRegisters(MachineBasicBlock &MBB,		spillCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
if (CSI.empty())		if (CSI.empty())
		jroelofsUnsubmitted Not Done Reply Inline Actions I think a better name for this would be "EmptyPop", with all of it's defs & uses inverted. jroelofs: I think a better name for this would be "EmptyPop", with all of it's defs & uses inverted.
return false;		return false;

DebugLoc DL;		DebugLoc DL;
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();

if (MI != MBB.end()) DL = MI->getDebugLoc();		if (MI != MBB.end()) DL = MI->getDebugLoc();

Show All 22 Lines	spillCalleeSavedRegisters(MachineBasicBlock &MBB,
return true;		return true;
}		}

bool Thumb1FrameLowering::		bool Thumb1FrameLowering::
restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {

		compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary empty line. compnerd: Unnecessary empty line.
		MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();

		if(MBBI->getOpcode() == ARM::TCRETURNri)
		return true; // Handle pop generation in emitEpliogue
		compnerdUnsubmitted Not Done Reply Inline Actions Seems like this could be written as: if (MBB.getLastNonDebugInstr()->getOpcode() == ARM::TCRETURNri) return true; compnerd: Seems like this could be written as: if (MBB.getLastNonDebugInstr()->getOpcode() == ARM…

if (CSI.empty())		if (CSI.empty())
return false;		return false;

MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
		jroelofsUnsubmitted Not Done Reply Inline Actions Get rid of the variable `IsTailCallReturn` and just check the condition here. jroelofs: Get rid of the variable `IsTailCallReturn` and just check the condition here.
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();

bool isVarArg = AFI->getArgRegsSaveSize() > 0;		bool isVarArg = AFI->getArgRegsSaveSize() > 0;
DebugLoc DL = MI->getDebugLoc();		DebugLoc DL = MI->getDebugLoc();

MachineInstrBuilder MIB = BuildMI(MF, DL, TII.get(ARM::tPOP));		MachineInstrBuilder MIB = BuildMI(MF, DL, TII.get(ARM::tPOP));
AddDefaultPred(MIB);		AddDefaultPred(MIB);

bool NumRegs = false;		bool IsEmptyPop = true;
for (unsigned i = CSI.size(); i != 0; --i) {		for (unsigned i = CSI.size(); i != 0; --i) {
unsigned Reg = CSI[i-1].getReg();		unsigned Reg = CSI[i-1].getReg();
if (Reg == ARM::LR) {		if (Reg == ARM::LR) {
// Special epilogue for vararg functions. See emitEpilogue		// Special epilogue for vararg functions. See emitEpilogue
if (isVarArg)		if (isVarArg)
continue;		continue;
// ARMv4T requires BX, see emitEpilogue		// ARMv4T requires BX, see emitEpilogue
if (STI.hasV4TOps() && !STI.hasV5TOps())		if (STI.hasV4TOps() && !STI.hasV5TOps())
continue;		continue;
Reg = ARM::PC;		Reg = ARM::PC;
		jroelofsUnsubmitted Not Done Reply Inline Actions changes here do nothing... would be better to revert them. same with the added newlines. jroelofs: changes here do nothing... would be better to revert them. same with the added newlines.
(*MIB).setDesc(TII.get(ARM::tPOP_RET));		(*MIB).setDesc(TII.get(ARM::tPOP_RET));
MIB.copyImplicitOps(&*MI);		MIB.copyImplicitOps(&*MI);
MI = MBB.erase(MI);		MI = MBB.erase(MI);
}		}
MIB.addReg(Reg, getDefRegState(true));		MIB.addReg(Reg, getDefRegState(true));
NumRegs = true;		IsEmptyPop = false;
}		}

// It's illegal to emit pop instruction without operands.		// It's illegal to emit pop instruction without operands.
if (NumRegs)		if (IsEmptyPop)
MBB.insert(MI, &*MIB);
else
MF.DeleteMachineInstr(MIB);		MF.DeleteMachineInstr(MIB);
		else
		MBB.insert(MI, &*MIB);

return true;		return true;
}		}

lib/Target/ARM/Thumb1RegisterInfo.cpp

	Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines
	Thumb1RegisterInfo::saveScavengerRegister(MachineBasicBlock &MBB,			Thumb1RegisterInfo::saveScavengerRegister(MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I,			MachineBasicBlock::iterator I,
	MachineBasicBlock::iterator &UseMI,			MachineBasicBlock::iterator &UseMI,
	const TargetRegisterClass *RC,			const TargetRegisterClass *RC,
	unsigned Reg) const {			unsigned Reg) const {
	// Thumb1 can't use the emergency spill slot on the stack because			// Thumb1 can't use the emergency spill slot on the stack because
	// ldr/str immediate offsets must be positive, and if we're referencing			// ldr/str immediate offsets must be positive, and if we're referencing
	// off the frame pointer (if, for example, there are alloca() calls in			// off the frame pointer (if, for example, there are alloca() calls in
	// the function, the offset will be negative. Use R12 instead since that's			// the function, the offset will be negative.
	// a call clobbered register that we know won't be used in Thumb1 mode.			// We need a register as emergency spill slot.
				// Use candidates are R12 and LR. R12 might be used in tail calls
				// and LR might be used if @llvm.returnaddress is taken.
				// Both are call clobbered register that otherwise won't be used in
				// Thumb1 mode.
				compnerdUnsubmitted Not Done Reply Inline Actions The text feels weird. Reflow the text? compnerd: The text feels weird. Reflow the text?

				MachineFunction &MF = *MBB.getParent();
				MachineFrameInfo *MFI = MF.getFrameInfo();

				bool IsLRInCSI = false;
				for (const CalleeSavedInfo &CSI : MFI->getCalleeSavedInfo()) {
				compnerdUnsubmitted Not Done Reply Inline Actions Unnecessary braces. compnerd: Unnecessary braces.
				if (CSI.getReg() == ARM::LR)
				IsLRInCSI = true;
				}

				unsigned ScavengeReg = ARM::R12;
				compnerdUnsubmitted Not Done Reply Inline Actions Is ScratchReg a better name perhaps? compnerd: Is ScratchReg a better name perhaps?
				if (IsLRInCSI && MBB.isLiveIn(ScavengeReg) && !MFI->isReturnAddressTaken())
				ScavengeReg = ARM::LR;

				assert(!MBB.isLiveIn(ScavengeReg) &&
				"No Scavenge register available.");

	const TargetInstrInfo &TII = *MBB.getParent()->getSubtarget().getInstrInfo();			const TargetInstrInfo &TII = *MBB.getParent()->getSubtarget().getInstrInfo();
	DebugLoc DL;			DebugLoc DL;
	AddDefaultPred(BuildMI(MBB, I, DL, TII.get(ARM::tMOVr))			AddDefaultPred(BuildMI(MBB, I, DL, TII.get(ARM::tMOVr))
	.addReg(ARM::R12, RegState::Define)			.addReg(ScavengeReg, RegState::Define)
	.addReg(Reg, RegState::Kill));			.addReg(Reg, RegState::Kill));

	// The UseMI is where we would like to restore the register. If there's			// The UseMI is where we would like to restore the register. If there's
	// interference with R12 before then, however, we'll need to restore it			// interference with R12 before then, however, we'll need to restore it
	// before that instead and adjust the UseMI.			// before that instead and adjust the UseMI.
	bool done = false;			bool done = false;
	for (MachineBasicBlock::iterator II = I; !done && II != UseMI ; ++II) {			for (MachineBasicBlock::iterator II = I; !done && II != UseMI ; ++II) {
	if (II->isDebugValue())			if (II->isDebugValue())
	continue;			continue;
	// If this instruction affects R12, adjust our restore point.			// If this instruction affects R12, adjust our restore point.
	for (unsigned i = 0, e = II->getNumOperands(); i != e; ++i) {			for (unsigned i = 0, e = II->getNumOperands(); i != e; ++i) {
	const MachineOperand &MO = II->getOperand(i);			const MachineOperand &MO = II->getOperand(i);
	if (MO.isRegMask() && MO.clobbersPhysReg(ARM::R12)) {			if (MO.isRegMask() && MO.clobbersPhysReg(ScavengeReg)) {
	UseMI = II;			UseMI = II;
	done = true;			done = true;
	break;			break;
	}			}
	if (!MO.isReg() \|\| MO.isUndef() \|\| !MO.getReg() \|\|			if (!MO.isReg() \|\| MO.isUndef() \|\| !MO.getReg() \|\|
	TargetRegisterInfo::isVirtualRegister(MO.getReg()))			TargetRegisterInfo::isVirtualRegister(MO.getReg()))
	continue;			continue;
	if (MO.getReg() == ARM::R12) {			if (MO.getReg() == ScavengeReg) {
	UseMI = II;			UseMI = II;
	done = true;			done = true;
	break;			break;
	}			}
	}			}
	}			}
	// Restore the register from R12			// Restore the register from R12
	AddDefaultPred(BuildMI(MBB, UseMI, DL, TII.get(ARM::tMOVr)).			AddDefaultPred(BuildMI(MBB, UseMI, DL, TII.get(ARM::tMOVr)).
	addReg(Reg, RegState::Define).addReg(ARM::R12, RegState::Kill));			addReg(Reg, RegState::Define).addReg(ScavengeReg, RegState::Kill));

	return true;			return true;
	}			}

	void			void
	Thumb1RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,			Thumb1RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
	int SPAdj, unsigned FIOperandNum,			int SPAdj, unsigned FIOperandNum,
	RegScavenger *RS) const {			RegScavenger *RS) const {
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

test/CodeGen/ARM/fourParametersTailCall_v6m.ll

				; RUN: llc -mtriple=thumbv6m-none--eabi -O3 %s -o - \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-none--eabi"

				define void @hugo(i32 %a, i32 %b, i32 %c, i32 %d) {
				tail call void @peter(i32 %a, i32 %b, i32 %c, i32 %d)
				ret void
				; CHECK: ldr r4, .LCPI0_0
				; CHECK: mov r12, r4
				; CHECK: ldr r4, [sp, #4]
				; CHECK: mov lr, r4
				; CHECK: pop {r4}
				; CHECK: add sp, #4
				; CHECK: bx r12
				; CHECK: .long peter
				}

				declare void @peter(i32, i32, i32, i32)

				define i64 @hugo64(i32 %a, i32 %b, i32 %c, i32 %d) {
				entry:
				%call = tail call i64 @peter64(i32 %a, i32 %b, i32 %c, i32 %d)
				ret i64 %call
				; CHECK: ldr r4, .LCPI
				; CHECK: mov r12, r4
				; CHECK: ldr r4, [sp, #4]
				; CHECK: mov lr, r4
				; CHECK: pop {r4}
				; CHECK: add sp, #4
				; CHECK: bx r12
				; CHECK: .long peter64
				}

				declare i64 @peter64(i32, i32, i32, i32)

test/CodeGen/ARM/threeParametersTailCall_v6m.ll

				; RUN: llc -mtriple=thumbv6m-none--eabi -O3 %s -o - \| FileCheck %s

				; ModuleID = 'threeParameters.c'
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-none--eabi"

				define void @hugo(i32 %a, i32 %b, i32 %c) {
				tail call void @peter(i32 %a, i32 %b, i32 %c)
				ret void
				; CHECK: ldr r3, .LCPI0_0
				; CHECK: bx r3
				; CHECK: .long peter
				}

				declare void @peter(i32, i32, i32)

				define i64 @hugo64(i32 %a, i32 %b, i32 %c) {
				%call = tail call i64 @peter64(i32 %a, i32 %b, i32 %c)
				ret i64 %call
				; CHECK: ldr r3, .LCPI
				; CHECK: bx r3
				; CHECK: .long peter64
				}

				declare i64 @peter64(i32, i32, i32)


				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				jroelofsUnsubmitted Not Done Reply Inline Actions I think there's only supposed to be one of these per file. Same for the triple. jroelofs: I think there's only supposed to be one of these per file. Same for the triple.
				target triple = "thumbv6m-none--eabi"

				define void @paul(i32 %a, i32 %b, i32 %c) {
				%1 = tail call i32 @otto()
				tail call void @anna(i32 %1, i32 %1, i32 %1)
				ret void
				; CHECK: ldr r3, .LCPI
				; CHECK: ldr r7, [sp, #4]
				; CHECK: mov lr, r7
				; CHECK: pop {r7}
				; CHECK: add sp, #4
				; CHECK: bx r3
				; CHECK: .long anna
				}

				declare i32 @otto()

				declare void @anna(i32, i32, i32)

test/CodeGen/ARM/twoParametersTailCall_v6m.ll

				; RUN: llc -mtriple=thumbv6m-none--eabi -O3 %s -o - \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-none--eabi"

				define void @hugo(i32 %a, i32 %b, i32 %c) {
				entry:
				jroelofsUnsubmitted Not Done Reply Inline Actions Now that you've dropped the metadata, you can also drop the #[0-9]'s jroelofs: Now that you've dropped the metadata, you can also drop the #[0-9]'s
				tail call void @nonTailCall()
				tail call void @peter(i32 %a, i32 %b)
				ret void
				; CHECK: pop {r3}
				; CHECK: mov lr, r3
				; CHECK: bx r2
				; CHECK: .long peter
				}

				declare void @nonTailCall()

				declare void @peter(i32, i32)

				jroelofsUnsubmitted Not Done Reply Inline Actions I think all of the attributes and debug information can be dropped. Most tests put the `CHECK:` lines inbetween the `ret` and the '}' at the end of the function. jroelofs: I think all of the attributes and debug information can be dropped. Most tests put the `CHECK:`…