This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Add option to use shared epilogues in compiler-rt
Needs ReviewPublic

Authored by MatzeB on Dec 16 2015, 7:25 PM.

Download Raw Diff

Details

Reviewers

kristof.beyls
aadg
t.p.northover

Summary

Most aarch64 function epilogues look the same: A series of ldp
instructions followed by a ret. In fact about all epilogues fall in 1 of
16 patterns. These 16 epilogues are put into compiler-rt to be shared
and the epilogue code is replaced with a jump to these epilogues.

In a testsuite compiled with -Os sharing the epilogues gives a 1.7% percent
code size reduction.

This patch adds the -aarch64-shared-epilogues switch I will perform more
benchmarking to decide whether this is a candidate for -Os or -Oz.

Related to rdar://23082514

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 43096.Dec 16 2015, 7:25 PM

MatzeB retitled this revision from to AArch64: Add option to use shared epilogues in compiler-rt.

MatzeB updated this object.

MatzeB added reviewers: t.p.northover, jmolloy, mcrosier, aadg.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added subscribers: llvm-commits, ab.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptDec 16 2015, 7:25 PM

MatzeB added a parent revision: D15601: Add epilogues for arm64 epilogue sharing logic.Dec 16 2015, 7:27 PM

kristof.beyls added a subscriber: kristof.beyls.Dec 17 2015, 12:10 AM

I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
Did you also happen to measure the impact on performance?

Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:

It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.
The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.
My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.

This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?

In D15600#314648, @kristof.beyls wrote:

I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
Did you also happen to measure the impact on performance?

Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:

It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.

The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.

My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

Yes I agree and I have been thinking about this as well, I disregarded the idea when I realized that we have no infrastructure to place basic blocks into different sections.
However thinking about this now, it may be possible to create pseudo functions on-the-fly just like the pseudo functions I put into compiler-rt, I'll look into this.

Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.

We can just describe the contents of the block in a unique way (in this implementation the name contains all the restored registers in order of restoration).

This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?

It may be possible to do something with the prologues as well, but as these require a function call or similar mechanism the performance impact seemed bigger.

For reference, Andrew Waterman and others in the RISC-V team looked at using function calls to register store/load helper functions to reduce size overhead in epilogues and prologues as an alternative to supporting load-multiple and store-multiple in the ISA. David Patterson described some of this work at the last RISC-V workshop http://riscv.org/workshop-jun2015/riscv-compressed-workshop-june2015.pdf. See slide 15.

jmolloy resigned from this revision.Jan 6 2016, 5:29 AM

jmolloy edited reviewers, added: kristof.beyls; removed: jmolloy.

In D15600#314648, @kristof.beyls wrote:

My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

We don't claim to support non-matching compiler-rt versions though, do we? I thought users/distros were supposed to always use the correct version.

lib/Target/AArch64/AArch64FrameLowering.cpp
856	When does RET occur here? I can't remember a way to bypass RET_ReallyLR.

In D15600#323879, @ab wrote:

In D15600#314648, @kristof.beyls wrote:

My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

We don't claim to support non-matching compiler-rt versions though, do we? I thought users/distros were supposed to always use the correct version.

I'm thinking of the case where object files compiled with different LLVM versions are linked together. If this isn't supported, it e.g. makes it pretty hard/impossible to distribute a library of binary code that can be linked with code generated by a number of different versions of LLVM. Allowing people that ship binary libraries to not have to ship a separate library for every single revision of clang/llvm seems like a good thing to me. Always requiring linking against the compiler-rt run-time library would probably also make it near-impossible to link together code produced by different compilers, unless these epilogue functions end up being defined in a de facto runtime library ABI?
I'm not sure if there's an official policy on this though.

All in all, it seems simpler to me to not have these epilogue functions in compiler-rt, but rather produce them in every object file that relies on them.

mcrosier resigned from this revision.Jan 12 2016, 7:53 AM

mcrosier removed a reviewer: mcrosier.

I looked into producing comdat functions and unfortunately I am not sure we can easily do this at the moment. All the codegen passes and the usual CodeGen/Passes.cpp pipeline is built from (Machine)FunctionPasses which are not allowed to create additional functions. I don't see an easy way out there yet.

As for keeping the epilogues in compiler-rt: I do not see how this case is any worse than anything else we have in compiler-rt; If you link with an incompatible/older version there is always a chance that things won't work, this should be the same for epilogues as for example soft-float intrinsics.

To avoid people accidentally changing the epilogue function I decidedly choose names that pretty much completely describe the content of the epilogue function: __epilogue_X19_X20_X21_X22 does what you would expect it to do: restore X19,X10,X21 and X22 in that order and return, I don't see how anyone would change the content of that function without also choosing a different function name.

In D15600#324957, @MatzeB wrote:

I looked into producing comdat functions and unfortunately I am not sure we can easily do this at the moment. All the codegen passes and the usual CodeGen/Passes.cpp pipeline is built from (Machine)FunctionPasses which are not allowed to create additional functions. I don't see an easy way out there yet.

As for keeping the epilogues in compiler-rt: I do not see how this case is any worse than anything else we have in compiler-rt; If you link with an incompatible/older version there is always a chance that things won't work, this should be the same for epilogues as for example soft-float intrinsics.

To avoid people accidentally changing the epilogue function I decidedly choose names that pretty much completely describe the content of the epilogue function: __epilogue_X19_X20_X21_X22 does what you would expect it to do: restore X19,X10,X21 and X22 in that order and return, I don't see how anyone would change the content of that function without also choosing a different function name.

Hi Matthias,

My main objection is around requiring people to use compiler-rt as the run-time library.
Right now, it isn't required to use compiler-rt as the run-time library. E.g. on linux, often libgcc is used as the run-time library.
Sure, for some features, off-by-default, compiler-rt is required, like the sanitizers. But then users explicitly opt-in by a command line option.
I'm assuming the goal is that these shared epilogues-generation will be on by-default. Users on systems linking against the libgcc run-time library will all of a sudden see link failures, without having opted in for a particular feature.
If the right solution would be for these functions to be in all AArch64-supporting run-time libraries, then these ought to be defined in the AArch64 run-time library ABI, which would take quite a bit of time and effort.

In short, I think there are 2 practical ways forward:

Only enable shared epilogues-generation on platforms that already demand using compiler-rt as the run-time library. I'm guessing that is Darwin-based platforms, but probably not much else?
As suggested by Renato in a previous comment, add a pass early enough in the pipeline so it can add functions, to add the function definitions for all of the shared epilogues. Even if some of the shared epilogues aren't used in the translation unit, or in the finally linked program, that's OK: the linker should eliminate those. The drawback is that object files will be slightly larger. The advantage is that this should work on all platforms.

All-in-all, with the limited knowledge I have of all the details involved, I prefer option 2 if possible.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64AsmPrinter.cpp

1 line

AArch64FrameLowering.h

8 lines

AArch64FrameLowering.cpp

97 lines

AArch64InstrInfo.td

4 lines

test/

CodeGen/

AArch64/

shared_epilogues.ll

116 lines

Diff 43096

lib/Target/AArch64/AArch64AsmPrinter.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	void AArch64AsmPrinter::EmitInstruction(const MachineInstr *MI) {
// instruction here.		// instruction here.
case AArch64::TCRETURNri: {		case AArch64::TCRETURNri: {
MCInst TmpInst;		MCInst TmpInst;
TmpInst.setOpcode(AArch64::BR);		TmpInst.setOpcode(AArch64::BR);
TmpInst.addOperand(MCOperand::createReg(MI->getOperand(0).getReg()));		TmpInst.addOperand(MCOperand::createReg(MI->getOperand(0).getReg()));
EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);
return;		return;
}		}
		case AArch64::B_EPILOGUE:
case AArch64::TCRETURNdi: {		case AArch64::TCRETURNdi: {
MCOperand Dest;		MCOperand Dest;
MCInstLowering.lowerOperand(MI->getOperand(0), Dest);		MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
MCInst TmpInst;		MCInst TmpInst;
TmpInst.setOpcode(AArch64::B);		TmpInst.setOpcode(AArch64::B);
TmpInst.addOperand(Dest);		TmpInst.addOperand(Dest);
EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);
return;		return;
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64FrameLowering.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	public:

void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

/// Returns true if the target will correctly handle shrink wrapping.		/// Returns true if the target will correctly handle shrink wrapping.
bool enableShrinkWrapping(const MachineFunction &MF) const override {		bool enableShrinkWrapping(const MachineFunction &MF) const override {
return true;		return true;
}		}

		private:
		/// Try to Replace ldp*, ret sequence with jump to shared epilogue
		/// code.
		bool tryJumpToSharedEpilogue(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator InsertBefore,
		const std::vector<CalleeSavedInfo> &CSI,
		const TargetRegisterInfo &TRI) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/Mangler.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "frame-info"		#define DEBUG_TYPE "frame-info"

static cl::opt<bool> EnableRedZone("aarch64-redzone",		static cl::opt<bool> EnableRedZone("aarch64-redzone",
cl::desc("enable use of redzone on AArch64"),		cl::desc("enable use of redzone on AArch64"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		static cl::opt<bool> EnableSharedEpilogues("aarch64-shared-epilogues",
		cl::desc("Use shared epilogue code in compiler-rt"), cl::Hidden);

STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");		STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");

bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {		bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
if (!EnableRedZone)		if (!EnableRedZone)
return false;		return false;
// Don't use the red zone if the function explicitly asks us not to.		// Don't use the red zone if the function explicitly asks us not to.
// This is typically used for kernel code.		// This is typically used for kernel code.
if (MF.getFunction()->hasFnAttribute(Attribute::NoRedZone))		if (MF.getFunction()->hasFnAttribute(Attribute::NoRedZone))
▲ Show 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	case AArch64::LDPDpost:
// FALLTHROUGH		// FALLTHROUGH
case AArch64::LDPXi:		case AArch64::LDPXi:
case AArch64::LDPDi:		case AArch64::LDPDi:
if (!isCalleeSavedRegister(MI.getOperand(RtIdx).getReg(), CSRegs) \|\|		if (!isCalleeSavedRegister(MI.getOperand(RtIdx).getReg(), CSRegs) \|\|
!isCalleeSavedRegister(MI.getOperand(RtIdx + 1).getReg(), CSRegs) \|\|		!isCalleeSavedRegister(MI.getOperand(RtIdx + 1).getReg(), CSRegs) \|\|
MI.getOperand(RtIdx + 2).getReg() != AArch64::SP)		MI.getOperand(RtIdx + 2).getReg() != AArch64::SP)
return 0;		return 0;
return 2;		return 2;
		case AArch64::B_EPILOGUE: {
		MachineOperand &MO = MI.getOperand(1); // NRegsRestored.
		return MO.getImm();
		}
}		}
return 0;		return 0;
}		}

void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,		void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();		MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
MachineFrameInfo *MFI = MF.getFrameInfo();		MachineFrameInfo *MFI = MF.getFrameInfo();
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
//		//
// AArch64TargetLowering::LowerCall figures out ArgumentPopSize and keeps		// AArch64TargetLowering::LowerCall figures out ArgumentPopSize and keeps
// it as the 2nd argument of AArch64ISD::TC_RETURN.		// it as the 2nd argument of AArch64ISD::TC_RETURN.
NumBytes += ArgumentPopSize;		NumBytes += ArgumentPopSize;

unsigned NumRestores = 0;		unsigned NumRestores = 0;
// Move past the restores of the callee-saved registers.		// Move past the restores of the callee-saved registers.
MachineBasicBlock::iterator LastPopI = MBB.getFirstTerminator();		MachineBasicBlock::iterator LastPopI = MBB.getFirstTerminator();
		// B_EPILOGUE is terminator and restores regs.
		if (LastPopI != MBB.end() && LastPopI->getOpcode() == AArch64::B_EPILOGUE)
		++LastPopI;
const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);		const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);
MachineBasicBlock::iterator Begin = MBB.begin();		MachineBasicBlock::iterator Begin = MBB.begin();
while (LastPopI != Begin) {		while (LastPopI != Begin) {
--LastPopI;		--LastPopI;
unsigned Restores = getNumCSRestores(*LastPopI, CSRegs);		unsigned Restores = getNumCSRestores(*LastPopI, CSRegs);
NumRestores += Restores;		NumRestores += Restores;
if (Restores == 0) {		if (Restores == 0) {
++LastPopI;		++LastPopI;
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	MIB.addReg(Reg2, getPrologueDeath(MF, Reg2))
.addReg(Reg1, getPrologueDeath(MF, Reg1))		.addReg(Reg1, getPrologueDeath(MF, Reg1))
.addReg(AArch64::SP)		.addReg(AArch64::SP)
.addImm(Offset) // [sp, #offset * 8], where factor * 8 is implicit		.addImm(Offset) // [sp, #offset * 8], where factor * 8 is implicit
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
return true;		return true;
}		}

		/// We only put the most common epilogues into compiler-rt. This function
		/// evaluates whether a given CalleeSavedInfo vector produces one of them.
		static bool isSharedEpilogueAvailable(const std::vector<CalleeSavedInfo> &CSI) {
		unsigned Count = CSI.size();
		unsigned Prefix = 0;
		if (Count % 2 != 0 \|\| Count == 0)
		return false;
		// We support all variants with LR,FP as first two registers.
		if (CSI[0].getReg() == AArch64::LR && CSI[1].getReg() == AArch64::FP)
		Prefix += 2;
		// Last two registers are X27,X28?
		if (CSI[Count-2].getReg() == AArch64::X27 &&
		CSI[Count-1].getReg() == AArch64::X28) {
		// X27+X28 is allowed as a special case.
		if (Count == 2)
		return true;
		// Otherwise we only support variants that also started with LR,FP.
		if (Prefix != 2)
		return false;
		Count -= 2;
		}
		// The remaining registers must be pairs from X19 up to X26.
		static const unsigned Sequence[] = {
		AArch64::X19, AArch64::X20, AArch64::X21, AArch64::X22,
		AArch64::X23, AArch64::X24, AArch64::X25, AArch64::X26,
		};
		if (Count-Prefix > array_lengthof(Sequence))
		return false;
		for (unsigned I = Prefix; I < Count - Prefix; I += 2) {
		if (CSI[I].getReg() != Sequence[I-Prefix])
		return false;
		}
		return true;
		}

		bool AArch64FrameLowering::tryJumpToSharedEpilogue(
		MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,
		const std::vector<CalleeSavedInfo> &CSI,
		const TargetRegisterInfo &TRI) const {
		assert(InsertBefore != MBB.end());
		if (!EnableSharedEpilogues)
		return false;

		unsigned OpCode = InsertBefore->getOpcode();
		if (OpCode != AArch64::RET && OpCode != AArch64::RET_ReallyLR)
		abUnsubmitted Not Done Reply Inline Actions When does RET occur here? I can't remember a way to bypass RET_ReallyLR. ab: When does RET occur here? I can't remember a way to bypass RET_ReallyLR.
		return false;

		if (!isSharedEpilogueAvailable(CSI))
		return false;

		// Construct label name of epilogue code.
		SmallString<60> EpilogueName("__epilogue");
		for (const CalleeSavedInfo &Info : CSI) {
		unsigned Reg = Info.getReg();
		EpilogueName += '_';
		EpilogueName += TRI.getName(Reg);
		}
		const MachineFunction &MF = *MBB.getParent();
		const DataLayout &TD = MF.getDataLayout();
		SmallString<60> MangledEpilogueName;
		Mangler::getNameWithPrefix(MangledEpilogueName, EpilogueName, TD);

		// Build jump to shared epilogue code.
		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
		DebugLoc DL = InsertBefore->getDebugLoc();
		const MCInstrDesc &MCID = TII.get(AArch64::B_EPILOGUE);
		BuildMI(MBB, InsertBefore, DL, MCID)
		.addExternalSymbol(strdup(MangledEpilogueName.c_str()))
		.addImm(CSI.size()) // NRegsRestored
		.copyImplicitOps(&*InsertBefore);
		// Remove ret.
		InsertBefore->removeFromParent();
		return true;
		}

bool AArch64FrameLowering::restoreCalleeSavedRegisters(		bool AArch64FrameLowering::restoreCalleeSavedRegisters(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
unsigned Count = CSI.size();		unsigned Count = CSI.size();
DebugLoc DL;		DebugLoc DL;
assert((Count & 1) == 0 && "Odd number of callee-saved regs to spill!");		assert((Count & 1) == 0 && "Odd number of callee-saved regs to spill!");

if (MI != MBB.end())		if (InsertBefore != MBB.end()) {
DL = MI->getDebugLoc();		DL = InsertBefore->getDebugLoc();
		if (tryJumpToSharedEpilogue(MBB, InsertBefore, CSI, *TRI))
		return true;
		}

for (unsigned i = 0; i < Count; i += 2) {		for (unsigned i = 0; i < Count; i += 2) {
unsigned Reg1 = CSI[i].getReg();		unsigned Reg1 = CSI[i].getReg();
unsigned Reg2 = CSI[i + 1].getReg();		unsigned Reg2 = CSI[i + 1].getReg();
// GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI		// GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
// list to come in sorted by frame index so that we can issue the store		// list to come in sorted by frame index so that we can issue the store
// pair instructions directly. Assert if we see anything otherwise.		// pair instructions directly. Assert if we see anything otherwise.
assert(CSI[i].getFrameIdx() + 1 == CSI[i + 1].getFrameIdx() &&		assert(CSI[i].getFrameIdx() + 1 == CSI[i + 1].getFrameIdx() &&
Show All 29 Lines	DEBUG(dbgs() << "CSR restore: (" << TRI->getName(Reg1) << ", "
<< TRI->getName(Reg2) << ") -> fi#(" << CSI[i].getFrameIdx()		<< TRI->getName(Reg2) << ") -> fi#(" << CSI[i].getFrameIdx()
<< ", " << CSI[i + 1].getFrameIdx() << ")\n");		<< ", " << CSI[i + 1].getFrameIdx() << ")\n");

// Compute offset: i = 0 => offset = Count - 2; i = 2 => offset = Count - 4;		// Compute offset: i = 0 => offset = Count - 2; i = 2 => offset = Count - 4;
// etc.		// etc.
const int Offset = (i == Count - 2) ? Count : Count - i - 2;		const int Offset = (i == Count - 2) ? Count : Count - i - 2;
assert((Offset >= -64 && Offset <= 63) &&		assert((Offset >= -64 && Offset <= 63) &&
"Offset out of bounds for LDP immediate");		"Offset out of bounds for LDP immediate");
MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc));		MachineInstrBuilder MIB = BuildMI(MBB, InsertBefore, DL, TII.get(LdrOpc));
if (LdrOpc == AArch64::LDPXpost \|\| LdrOpc == AArch64::LDPDpost)		if (LdrOpc == AArch64::LDPXpost \|\| LdrOpc == AArch64::LDPDpost)
MIB.addReg(AArch64::SP, RegState::Define);		MIB.addReg(AArch64::SP, RegState::Define);

MIB.addReg(Reg2, getDefRegState(true))		MIB.addReg(Reg2, getDefRegState(true))
.addReg(Reg1, getDefRegState(true))		.addReg(Reg1, getDefRegState(true))
.addReg(AArch64::SP)		.addReg(AArch64::SP)
.addImm(Offset); // [sp], #offset * 8 or [sp, #offset * 8]		.addImm(Offset); // [sp], #offset * 8 or [sp, #offset * 8]
// where the factor * 8 is implicit		// where the factor * 8 is implicit
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,988 Lines • ▼ Show 20 Lines

	def : Pat<(AArch64tcret tcGPR64:$dst, (i32 timm:$FPDiff)),			def : Pat<(AArch64tcret tcGPR64:$dst, (i32 timm:$FPDiff)),
	(TCRETURNri tcGPR64:$dst, imm:$FPDiff)>;			(TCRETURNri tcGPR64:$dst, imm:$FPDiff)>;
	def : Pat<(AArch64tcret tglobaladdr:$dst, (i32 timm:$FPDiff)),			def : Pat<(AArch64tcret tglobaladdr:$dst, (i32 timm:$FPDiff)),
	(TCRETURNdi texternalsym:$dst, imm:$FPDiff)>;			(TCRETURNdi texternalsym:$dst, imm:$FPDiff)>;
	def : Pat<(AArch64tcret texternalsym:$dst, (i32 timm:$FPDiff)),			def : Pat<(AArch64tcret texternalsym:$dst, (i32 timm:$FPDiff)),
	(TCRETURNdi texternalsym:$dst, imm:$FPDiff)>;			(TCRETURNdi texternalsym:$dst, imm:$FPDiff)>;

				let isTerminator = 1, isReturn = 1, isBarrier = 1, Uses = [SP] in {
				def B_EPILOGUE : Pseudo<(outs), (ins i64imm:$dst, i32imm:$NRegsRestored),[]>;
				}

	include "AArch64InstrAtomics.td"			include "AArch64InstrAtomics.td"

test/CodeGen/AArch64/shared_epilogues.ll

This file was added.

				; RUN: llc -aarch64-shared-epilogues=1 -o - %s \| FileCheck %s --check-prefix CHECK --check-prefix SHARED
				; RUN: llc -aarch64-shared-epilogues=0 -o - %s \| FileCheck %s --check-prefix CHECK --check-prefix NOSHAR
				target triple="aarch64--"

				declare void @extfunc()

				; CHECK-LABEL: f0:
				define void @f0() {
				; CHECK: stp x29, x30, [sp, #-16]
				; CHECK: bl extfunc
				call void @extfunc()
				; NOSHAR: ldp x29, x30, [sp], #16
				; NOSHAR-NEXT: ret
				; SHARED: b __epilogue_LR_FP
				; SHARED-NOT: ret
				ret void
				}

				@v0 = external global i32
				@v1 = external global i32
				@v2 = external global i32
				@v3 = external global i32
				@v4 = external global i32
				@v5 = external global i32

				; CHECK-LABEL: f1:
				define void @f1() {
				; CHECK: stp x20, x19, [sp, #-32]!
				; CHECK-NEXT: stp x29, x30, [sp, #16]
				%v0 = load volatile i32, i32* @v0
				; CHECK: bl extfunc
				call void @extfunc()
				store volatile i32 %v0, i32* @v0
				; NOSHAR: ldp x29, x30, [sp, #16]
				; NOSHAR-NEXT: ldp x20, x19, [sp], #32
				; NOSHAR-NEXT: ret
				; SHARED: b __epilogue_LR_FP_X19_X20
				; SHARED-NOT: ret
				ret void
				}

				; CHECK-LABEL: f2:
				define void @f2() {
				; CHECK: stp x28, x27, [sp, #-96]!
				; CHECK-NEXT: stp x26, x25, [sp, #16]
				; CHECK-NEXT: stp x24, x23, [sp, #32]
				; CHECK-NEXT: stp x22, x21, [sp, #48]
				; CHECK-NEXT: stp x20, x19, [sp, #64]
				; CHECK-NEXT: stp x29, x30, [sp, #80]
				%v0 = load volatile i32, i32* @v0
				%v1 = load volatile i32, i32* @v1
				%v2 = load volatile i32, i32* @v2
				%v3 = load volatile i32, i32* @v3
				%v4 = load volatile i32, i32* @v4
				%v5 = load volatile i32, i32* @v5
				; CHECK: bl extfunc
				call void @extfunc()
				store volatile i32 %v0, i32* @v0
				store volatile i32 %v1, i32* @v1
				store volatile i32 %v2, i32* @v2
				store volatile i32 %v3, i32* @v3
				store volatile i32 %v4, i32* @v4
				store volatile i32 %v5, i32* @v5
				; NOSHARE: ldp x29, x30, [sp, #80]
				; NOSHARE-NEXT: ldp x20, x19, [sp, #64]
				; NOSHARE-NEXT: ldp x22, x21, [sp, #48]
				; NOSHARE-NEXT: ldp x24, x23, [sp, #32]
				; NOSHARE-NEXT: ldp x26, x25, [sp, #16]
				; NOSHARE-NEXT: ldp x28, x27, [sp], #96
				; NOSHARE-NEXT: ret
				; SHARED: b __epilogue_LR_FP_X19_X20_X21_X22_X23_X24_X25_X26_X27_X28
				; SHARED-NOT: ret
				ret void
				}

				; CHECK-LABEL: a0:
				define void @a0() {
				call void asm sideeffect "", "~{x19},~{x20},~{x21},~{x22}"()
				; NOSHARE: ldp x20, x19, [sp, #16]
				; NOSHARE-NEXT: ldp x22, x21, [sp], #32
				; NOSHARE-NEXT: ret
				; SHARED: b __epilogue_X19_X20_X21_X22
				; SHARED-NOT: ret
				ret void
				}

				; CHECK-LABEL: a1:
				define void @a1() {
				call void asm sideeffect "", "~{x27},~{x28}"()
				; NOSHARE: ldp x28, x27, [sp], #16
				; NOSHARE-NEXT: ret
				; SHARED: b __epilogue_X27_X28
				; SHARED-NOT: ret
				ret void
				}

				; CHECK-LABEL: a2:
				define void @a2() {
				call void asm sideeffect "", "~{x25},~{x26}"()
				; This is epilogue pattern is not present in compiler-rt
				; CHECK-NOT: b __epilogue
				; CHECK: ldp x26, x25, [sp], #16
				; CHECK-NEXT: ret
				ret void
				}

				; CHECK-LABEL: a3:
				define void @a3() {
				call void asm sideeffect "", "~{X19},~{X20},~{x27},~{x28}"()
				; This is epilogue pattern is not present in compiler-rt
				; CHECK-NOT: b __epilogue
				; CHECK: ldp x20, x19, [sp, #16]
				; CHECK-NEXT: ldp x28, x27, [sp], #32
				; CHECK-NEXT: ret
				ret void
				}