This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64CallingConvention.td
2/12
AArch64FrameLowering.cpp
-
AArch64RegisterInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-vector-pcs.mir

Differential D51479

[AArch64] Implement aarch64_vector_pcs codegen support.
ClosedPublic

Authored by sdesmalen on Aug 30 2018, 2:16 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
gberry
thegameg
rengolin
javed.absar
MatzeB

Commits

rG2d77e788f2fd: [AArch64] Implement aarch64_vector_pcs codegen support.
rL342049: [AArch64] Implement aarch64_vector_pcs codegen support.

Summary

This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.

Diff Detail

Event Timeline

sdesmalen created this revision.Aug 30 2018, 2:16 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 30 2018, 2:16 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

sdesmalen added parent revisions: D51478: [AArch64] NFC: Refactoring to prepare for vector PCS., D51477: [AArch64] Add parsing of aarch64_vector_pcs attribute..Aug 30 2018, 2:17 AM

Thanks for working on this! Other than the comments I left it looks good to me.

lib/Target/AArch64/AArch64FrameLowering.cpp
538	Should this be `% Scale`?
1523	Would this also work? for (unsigned Reg : SavedRegs.set_bits()) Size += TRI.getRegSizeInBits(Reg, MRI); or something like `std::accumulate`.
1588	Why do we need to recalculate this here? Can't we just update it along the way?
test/CodeGen/AArch64/aarch64-vector-pcs.ll
29 ↗	(On Diff #163283)	I think one nice way of testing this is using MIR. You can write things like: $x19 = IMPLICIT_DEF $q10 = IMPLICIT_DEF $q11 = IMPLICIT_DEF and use `llc -run-pass prologepilog` or `llc -start-before prologepilog` to run it. You can also look at `test/CodeGen/AArch64/reverse-csr-restore-seq.mir`. You might also want to check the `stack:` entries for the size and alignment like it's done in `test/CodeGen/AArch64/spill-stack-realignment.mir`.

sdesmalen marked 2 inline comments as done.Aug 31 2018, 2:53 AM

sdesmalen added inline comments.

lib/Target/AArch64/AArch64FrameLowering.cpp
538	I think it should, well spotted!
1523	I tried it and it seems to work, but one of the unit tests fails. I think this is because it exposes a bug where Reg is AArch64::NoRegister (set on line 1515 above to ensure registers are being stored in pairs for MachO's compact unwind format).
1588	No need to recalculate indeed, I've fixed it now.
test/CodeGen/AArch64/aarch64-vector-pcs.ll
29 ↗	(On Diff #163283)	Thanks for the suggestion. When I created the test initially I wasn't convinced that having a .mir test would add much benefit, but I think being able to check the alignment of the objects on the stack is a good reason to do so (rather than observing the effects in the resulting assembly). I've migrated the test to a .mir test in my latest revision.

Some refactoring to maintain CSStackSize.
Migrated .ll test to a .mir test.

thegameg added a reviewer: MatzeB.Sep 3 2018, 7:04 AM

thegameg added inline comments.

lib/Target/AArch64/AArch64FrameLowering.cpp
1523	I guess we could check for `!Reg` in the loop, or fix `getRegSizeInBits`. I am not sure if returning 0 in `getRegSizeInBits` would be a good idea but I will let others comment on that. Either way, it could be a good idea to add a comment + an assert in `getRegSizeInBits`.
1588	Could you add a comment explaining that the `8` here is not `Scale` because we only add GPRs?

sdesmalen added inline comments.Sep 3 2018, 8:10 AM

lib/Target/AArch64/AArch64FrameLowering.cpp
1523	Rather than fixing it up in this loop, I wonder if the condition 'PairedReg != AArch64::NoRegister' should be added to the condition on line 1513, so that it is never set as a SavedReg to begin with. (an assert is already present in getRegSizeInbits). In the test that fails: `test/CodeGen/AArch64/preserve_mostcc.ll`, not all registers are stored in pairs (see `str x15`), which seems to contradict the reason for setting PairedReg according to the comment on line 1510. I don't really know enough about the MachO format to know whether this is a bug, so perhaps someone else can comment on that. @gberry ?
1588	yes, will do!

Calculate CSStackSize by accumulating reg-size for each saved register.
Only sets PairedReg in SavedRegs when it is not AArch64::NoRegister.
Removed trailing whitespace from test.

thegameg added inline comments.Sep 4 2018, 2:51 AM

lib/Target/AArch64/AArch64FrameLowering.cpp

1523

Right, it makes sense not to add it to SavedRegs. For compact unwinding we prefer storing registers in pairs to avoid encoding every single register in the unwind info, but in this case we definitely don't have support for mostcc. The supported registers are: x19 to x28 and d8 to d15, so in this case we will fallback on dwarf. I think this should be enough to fix it:

     // MachO's compact unwind format relies on all registers being stored in
     // pairs.
     // FIXME: the usual format is actually better if unwinding isn't needed.
-    if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg)) {
+    if (PairedReg != AArch64::NoRegister && produceCompactUnwindFrame(MF) &&
+        !SavedRegs.test(PairedReg)) {
       SavedRegs.set(PairedReg);
       if (AArch64::GPR64RegClass.contains(PairedReg) &&
           !RegInfo->isReservedReg(MF, PairedReg))

LGTM, thanks!

lib/Target/AArch64/AArch64FrameLowering.cpp
1523	You were faster than me commenting on this :) Thanks!

This revision is now accepted and ready to land.Sep 4 2018, 2:55 AM

Closed by commit rL342049: [AArch64] Implement aarch64_vector_pcs codegen support. (authored by s.desmalen). · Explain WhySep 12 2018, 5:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64CallingConvention.td

8 lines

AArch64FrameLowering.cpp

115 lines

AArch64RegisterInfo.cpp

6 lines

test/

CodeGen/

AArch64/

aarch64-vector-pcs.mir

253 lines

Diff 163493

lib/Target/AArch64/AArch64CallingConvention.td

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	// It would be better to model its preservation semantics properly (create a			// It would be better to model its preservation semantics properly (create a
	// vreg on entry, use it in RET & tail call generation; make that vreg def if we			// vreg on entry, use it in RET & tail call generation; make that vreg def if we
	// end up saving LR as part of a call frame). Watch this space...			// end up saving LR as part of a call frame). Watch this space...
	def CSR_AArch64_AAPCS : CalleeSavedRegs<(add LR, FP, X19, X20, X21, X22,			def CSR_AArch64_AAPCS : CalleeSavedRegs<(add LR, FP, X19, X20, X21, X22,
	X23, X24, X25, X26, X27, X28,			X23, X24, X25, X26, X27, X28,
	D8, D9, D10, D11,			D8, D9, D10, D11,
	D12, D13, D14, D15)>;			D12, D13, D14, D15)>;

				// AArch64 PCS for vector functions (VPCS)
				// must (additionally) preserve full Q8-Q23 registers
				def CSR_AArch64_AAVPCS : CalleeSavedRegs<(add LR, FP, X19, X20, X21, X22,
				X23, X24, X25, X26, X27, X28,
				(sequence "Q%u", 8, 23))>;

	// Constructors and destructors return 'this' in the iOS 64-bit C++ ABI; since			// Constructors and destructors return 'this' in the iOS 64-bit C++ ABI; since
	// 'this' and the pointer return value are both passed in X0 in these cases,			// 'this' and the pointer return value are both passed in X0 in these cases,
	// this can be partially modelled by treating X0 as a callee-saved register;			// this can be partially modelled by treating X0 as a callee-saved register;
	// only the resulting RegMask is used; the SaveList is ignored			// only the resulting RegMask is used; the SaveList is ignored
	//			//
	// (For generic ARM 64-bit ABI code, clang will not generate constructors or			// (For generic ARM 64-bit ABI code, clang will not generate constructors or
	// destructors with 'this' returns, so this RegMask will not be used in that			// destructors with 'this' returns, so this RegMask will not be used in that
	// case)			// case)
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	def CSR_AArch64_AllRegs_SCS			def CSR_AArch64_AllRegs_SCS
	: CalleeSavedRegs<(add CSR_AArch64_AllRegs, X18)>;			: CalleeSavedRegs<(add CSR_AArch64_AllRegs, X18)>;
	def CSR_AArch64_CXX_TLS_Darwin_SCS			def CSR_AArch64_CXX_TLS_Darwin_SCS
	: CalleeSavedRegs<(add CSR_AArch64_CXX_TLS_Darwin, X18)>;			: CalleeSavedRegs<(add CSR_AArch64_CXX_TLS_Darwin, X18)>;
	def CSR_AArch64_AAPCS_SwiftError_SCS			def CSR_AArch64_AAPCS_SwiftError_SCS
	: CalleeSavedRegs<(add CSR_AArch64_AAPCS_SwiftError, X18)>;			: CalleeSavedRegs<(add CSR_AArch64_AAPCS_SwiftError, X18)>;
	def CSR_AArch64_RT_MostRegs_SCS			def CSR_AArch64_RT_MostRegs_SCS
	: CalleeSavedRegs<(add CSR_AArch64_RT_MostRegs, X18)>;			: CalleeSavedRegs<(add CSR_AArch64_RT_MostRegs, X18)>;
				def CSR_AArch64_AAVPCS_SCS
				: CalleeSavedRegs<(add CSR_AArch64_AAVPCS, X18)>;
	def CSR_AArch64_AAPCS_SCS			def CSR_AArch64_AAPCS_SCS
	: CalleeSavedRegs<(add CSR_AArch64_AAPCS, X18)>;			: CalleeSavedRegs<(add CSR_AArch64_AAPCS, X18)>;

lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
case AArch64::STPXi:		case AArch64::STPXi:
NewOpc = AArch64::STPXpre;		NewOpc = AArch64::STPXpre;
Scale = 8;		Scale = 8;
break;		break;
case AArch64::STPDi:		case AArch64::STPDi:
NewOpc = AArch64::STPDpre;		NewOpc = AArch64::STPDpre;
Scale = 8;		Scale = 8;
break;		break;
		case AArch64::STPQi:
		NewOpc = AArch64::STPQpre;
		Scale = 16;
		break;
case AArch64::STRXui:		case AArch64::STRXui:
NewOpc = AArch64::STRXpre;		NewOpc = AArch64::STRXpre;
break;		break;
case AArch64::STRDui:		case AArch64::STRDui:
NewOpc = AArch64::STRDpre;		NewOpc = AArch64::STRDpre;
break;		break;
		case AArch64::STRQui:
		NewOpc = AArch64::STRQpre;
		break;
case AArch64::LDPXi:		case AArch64::LDPXi:
NewOpc = AArch64::LDPXpost;		NewOpc = AArch64::LDPXpost;
Scale = 8;		Scale = 8;
break;		break;
case AArch64::LDPDi:		case AArch64::LDPDi:
NewOpc = AArch64::LDPDpost;		NewOpc = AArch64::LDPDpost;
Scale = 8;		Scale = 8;
break;		break;
		case AArch64::LDPQi:
		NewOpc = AArch64::LDPQpost;
		Scale = 16;
		break;
case AArch64::LDRXui:		case AArch64::LDRXui:
NewOpc = AArch64::LDRXpost;		NewOpc = AArch64::LDRXpost;
break;		break;
case AArch64::LDRDui:		case AArch64::LDRDui:
NewOpc = AArch64::LDRDpost;		NewOpc = AArch64::LDRDpost;
break;		break;
		case AArch64::LDRQui:
		NewOpc = AArch64::LDRQpost;
		break;
}		}

MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
MIB.addReg(AArch64::SP, RegState::Define);		MIB.addReg(AArch64::SP, RegState::Define);

// Copy all operands other than the immediate offset.		// Copy all operands other than the immediate offset.
unsigned OpndIdx = 0;		unsigned OpndIdx = 0;
for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;		for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
Show All 34 Lines	static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
case AArch64::STPDi:		case AArch64::STPDi:
case AArch64::STRDui:		case AArch64::STRDui:
case AArch64::LDPXi:		case AArch64::LDPXi:
case AArch64::LDRXui:		case AArch64::LDRXui:
case AArch64::LDPDi:		case AArch64::LDPDi:
case AArch64::LDRDui:		case AArch64::LDRDui:
Scale = 8;		Scale = 8;
break;		break;
		case AArch64::STPQi:
		case AArch64::STRQui:
		case AArch64::LDPQi:
		case AArch64::LDRQui:
		Scale = 16;
		break;
default:		default:
llvm_unreachable("Unexpected callee-save save/restore opcode!");		llvm_unreachable("Unexpected callee-save save/restore opcode!");
}		}

unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;		unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&		assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
"Unexpected base register in callee-save save/restore instruction!");		"Unexpected base register in callee-save save/restore instruction!");
// Last operand is immediate offset that needs fixing.		// Last operand is immediate offset that needs fixing.
MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);		MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
// All generated opcodes have scaled offsets.		// All generated opcodes have scaled offsets.
assert(LocalStackSize % 8 == 0);		assert(LocalStackSize % Scale == 0);
		thegamegUnsubmitted Done Reply Inline Actions Should this be `% Scale`? thegameg: Should this be `% Scale`?
		sdesmalenAuthorUnsubmitted Not Done Reply Inline Actions I think it should, well spotted! sdesmalen: I think it should, well spotted!
OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);		OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
}		}

static void adaptForLdStOpt(MachineBasicBlock &MBB,		static void adaptForLdStOpt(MachineBasicBlock &MBB,
MachineBasicBlock::iterator FirstSPPopI,		MachineBasicBlock::iterator FirstSPPopI,
MachineBasicBlock::iterator LastPopI) {		MachineBasicBlock::iterator LastPopI) {
// Sometimes (when we restore in the same order as we save), we can end up		// Sometimes (when we restore in the same order as we save), we can end up
// with code like this:		// with code like this:
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines

namespace {		namespace {

struct RegPairInfo {		struct RegPairInfo {
unsigned Reg1 = AArch64::NoRegister;		unsigned Reg1 = AArch64::NoRegister;
unsigned Reg2 = AArch64::NoRegister;		unsigned Reg2 = AArch64::NoRegister;
int FrameIdx;		int FrameIdx;
int Offset;		int Offset;
enum RegType { GPR, FPR64 } Type;		enum RegType { GPR, FPR64, FPR128 } Type;

RegPairInfo() = default;		RegPairInfo() = default;

bool isPaired() const { return Reg2 != AArch64::NoRegister; }		bool isPaired() const { return Reg2 != AArch64::NoRegister; }
};		};

} // end anonymous namespace		} // end anonymous namespace

Show All 21 Lines	static void computeCalleeSaveRegisterPairs(
for (unsigned i = 0; i < Count; ++i) {		for (unsigned i = 0; i < Count; ++i) {
RegPairInfo RPI;		RegPairInfo RPI;
RPI.Reg1 = CSI[i].getReg();		RPI.Reg1 = CSI[i].getReg();

if (AArch64::GPR64RegClass.contains(RPI.Reg1))		if (AArch64::GPR64RegClass.contains(RPI.Reg1))
RPI.Type = RegPairInfo::GPR;		RPI.Type = RegPairInfo::GPR;
else if (AArch64::FPR64RegClass.contains(RPI.Reg1))		else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
RPI.Type = RegPairInfo::FPR64;		RPI.Type = RegPairInfo::FPR64;
		else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
		RPI.Type = RegPairInfo::FPR128;
else		else
llvm_unreachable("Unsupported register class.");		llvm_unreachable("Unsupported register class.");

// Add the next reg to the pair if it is in the same register class.		// Add the next reg to the pair if it is in the same register class.
if (i + 1 < Count) {		if (i + 1 < Count) {
unsigned NextReg = CSI[i + 1].getReg();		unsigned NextReg = CSI[i + 1].getReg();
switch (RPI.Type) {		switch (RPI.Type) {
case RegPairInfo::GPR:		case RegPairInfo::GPR:
if (AArch64::GPR64RegClass.contains(NextReg))		if (AArch64::GPR64RegClass.contains(NextReg))
RPI.Reg2 = NextReg;		RPI.Reg2 = NextReg;
break;		break;
case RegPairInfo::FPR64:		case RegPairInfo::FPR64:
if (AArch64::FPR64RegClass.contains(NextReg))		if (AArch64::FPR64RegClass.contains(NextReg))
RPI.Reg2 = NextReg;		RPI.Reg2 = NextReg;
break;		break;
		case RegPairInfo::FPR128:
		if (AArch64::FPR128RegClass.contains(NextReg))
		RPI.Reg2 = NextReg;
		break;
}		}
}		}

// If either of the registers to be saved is the lr register, it means that		// If either of the registers to be saved is the lr register, it means that
// we also need to save lr in the shadow call stack.		// we also need to save lr in the shadow call stack.
if ((RPI.Reg1 == AArch64::LR \|\| RPI.Reg2 == AArch64::LR) &&		if ((RPI.Reg1 == AArch64::LR \|\| RPI.Reg2 == AArch64::LR) &&
MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)) {		MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)) {
if (!MF.getSubtarget<AArch64Subtarget>().isX18Reserved())		if (!MF.getSubtarget<AArch64Subtarget>().isX18Reserved())
Show All 17 Lines	assert((!produceCompactUnwindFrame(MF) \|\|
CC == CallingConv::PreserveMost \|\|		CC == CallingConv::PreserveMost \|\|
(RPI.isPaired() &&		(RPI.isPaired() &&
((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) \|\|		((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) \|\|
RPI.Reg1 + 1 == RPI.Reg2))) &&		RPI.Reg1 + 1 == RPI.Reg2))) &&
"Callee-save registers not saved as adjacent register pair!");		"Callee-save registers not saved as adjacent register pair!");

RPI.FrameIdx = CSI[i].getFrameIdx();		RPI.FrameIdx = CSI[i].getFrameIdx();

if (Count * 8 != AFI->getCalleeSavedStackSize() && !RPI.isPaired()) {		int Scale = RPI.Type == RegPairInfo::FPR128 ? 16 : 8;
		Offset -= RPI.isPaired() ? 2 * Scale : Scale;

// Round up size of non-pair to pair size if we need to pad the		// Round up size of non-pair to pair size if we need to pad the
// callee-save area to ensure 16-byte alignment.		// callee-save area to ensure 16-byte alignment.
Offset -= 16;		if (AFI->hasCalleeSaveStackFreeSpace() &&
		RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired()) {
		Offset -= 8;
		assert(Offset % 16 == 0);
assert(MFI.getObjectAlignment(RPI.FrameIdx) <= 16);		assert(MFI.getObjectAlignment(RPI.FrameIdx) <= 16);
MFI.setObjectAlignment(RPI.FrameIdx, 16);		MFI.setObjectAlignment(RPI.FrameIdx, 16);
AFI->setCalleeSaveStackHasFreeSpace(true);		}
} else
Offset -= RPI.isPaired() ? 16 : 8;		assert(Offset % Scale == 0);
assert(Offset % 8 == 0);		RPI.Offset = Offset / Scale;
RPI.Offset = Offset / 8;
assert((RPI.Offset >= -64 && RPI.Offset <= 63) &&		assert((RPI.Offset >= -64 && RPI.Offset <= 63) &&
"Offset out of bounds for LDP/STP immediate");		"Offset out of bounds for LDP/STP immediate");

RegPairs.push_back(RPI);		RegPairs.push_back(RPI);
if (RPI.isPaired())		if (RPI.isPaired())
++i;		++i;
}		}
}		}
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	case RegPairInfo::GPR:
Size = 8;		Size = 8;
Align = 8;		Align = 8;
break;		break;
case RegPairInfo::FPR64:		case RegPairInfo::FPR64:
StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;		StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
Size = 8;		Size = 8;
Align = 8;		Align = 8;
break;		break;
		case RegPairInfo::FPR128:
		StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
		Size = 16;
		Align = 16;
		break;
}		}
LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);		LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);		if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
dbgs() << ") -> fi#(" << RPI.FrameIdx;		dbgs() << ") -> fi#(" << RPI.FrameIdx;
if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;		if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
dbgs() << ")\n");		dbgs() << ")\n");

MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));		MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	case RegPairInfo::GPR:
Size = 8;		Size = 8;
Align = 8;		Align = 8;
break;		break;
case RegPairInfo::FPR64:		case RegPairInfo::FPR64:
LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;		LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
Size = 8;		Size = 8;
Align = 8;		Align = 8;
break;		break;
		case RegPairInfo::FPR128:
		LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
		Size = 16;
		Align = 16;
		break;
}		}
LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);		LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);		if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
dbgs() << ") -> fi#(" << RPI.FrameIdx;		dbgs() << ") -> fi#(" << RPI.FrameIdx;
if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;		if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
dbgs() << ")\n");		dbgs() << ")\n");

MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc));		MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc));
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,

MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);		const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);

unsigned BasePointerReg = RegInfo->hasBasePointer(MF)		unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
? RegInfo->getBaseRegister()		? RegInfo->getBaseRegister()
: (unsigned)AArch64::NoRegister;		: (unsigned)AArch64::NoRegister;

unsigned SpillEstimate = SavedRegs.count();
for (unsigned i = 0; CSRegs[i]; ++i) {
unsigned Reg = CSRegs[i];
unsigned PairedReg = CSRegs[i ^ 1];
if (Reg == BasePointerReg)
SpillEstimate++;
if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg))
SpillEstimate++;
}
SpillEstimate += 2; // Conservatively include FP+LR in the estimate
unsigned StackEstimate = MFI.estimateStackSize(MF) + 8 * SpillEstimate;

// The frame record needs to be created by saving the appropriate registers
if (hasFP(MF) \|\| windowsRequiresStackProbe(MF, StackEstimate)) {
SavedRegs.set(AArch64::FP);
SavedRegs.set(AArch64::LR);
}

unsigned ExtraCSSpill = 0;		unsigned ExtraCSSpill = 0;
// Figure out which callee-saved registers to save/restore.		// Figure out which callee-saved registers to save/restore.
for (unsigned i = 0; CSRegs[i]; ++i) {		for (unsigned i = 0; CSRegs[i]; ++i) {
const unsigned Reg = CSRegs[i];		const unsigned Reg = CSRegs[i];

// Add the base pointer register to SavedRegs if it is callee-save.		// Add the base pointer register to SavedRegs if it is callee-save.
if (Reg == BasePointerReg)		if (Reg == BasePointerReg)
SavedRegs.set(Reg);		SavedRegs.set(Reg);
Show All 15 Lines	for (unsigned i = 0; CSRegs[i]; ++i) {
if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg)) {		if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg)) {
SavedRegs.set(PairedReg);		SavedRegs.set(PairedReg);
if (AArch64::GPR64RegClass.contains(PairedReg) &&		if (AArch64::GPR64RegClass.contains(PairedReg) &&
!RegInfo->isReservedReg(MF, PairedReg))		!RegInfo->isReservedReg(MF, PairedReg))
ExtraCSSpill = PairedReg;		ExtraCSSpill = PairedReg;
}		}
}		}

		// Calculates the callee saved stack size.
		unsigned NumSavedRegs = SavedRegs.count();
		unsigned CSStackSize = 0;
		thegamegUnsubmitted Not Done Reply Inline Actions Would this also work? for (unsigned Reg : SavedRegs.set_bits()) Size += TRI.getRegSizeInBits(Reg, MRI); or something like `std::accumulate`. thegameg: Would this also work? ``` for (unsigned Reg : SavedRegs.set_bits()) Size += TRI.
		sdesmalenAuthorUnsubmitted Not Done Reply Inline Actions I tried it and it seems to work, but one of the unit tests fails. I think this is because it exposes a bug where Reg is AArch64::NoRegister (set on line 1515 above to ensure registers are being stored in pairs for MachO's compact unwind format). sdesmalen: I tried it and it seems to work, but one of the unit tests fails. I think this is because it…
		thegamegUnsubmitted Not Done Reply Inline Actions I guess we could check for `!Reg` in the loop, or fix `getRegSizeInBits`. I am not sure if returning 0 in `getRegSizeInBits` would be a good idea but I will let others comment on that. Either way, it could be a good idea to add a comment + an assert in `getRegSizeInBits`. thegameg: I guess we could check for `!Reg` in the loop, or fix `getRegSizeInBits`. I am not sure if…
		sdesmalenAuthorUnsubmitted Not Done Reply Inline Actions Rather than fixing it up in this loop, I wonder if the condition 'PairedReg != AArch64::NoRegister' should be added to the condition on line 1513, so that it is never set as a SavedReg to begin with. (an assert is already present in getRegSizeInbits). In the test that fails: `test/CodeGen/AArch64/preserve_mostcc.ll`, not all registers are stored in pairs (see `str x15`), which seems to contradict the reason for setting PairedReg according to the comment on line 1510. I don't really know enough about the MachO format to know whether this is a bug, so perhaps someone else can comment on that. @gberry ? sdesmalen: Rather than fixing it up in this loop, I wonder if the condition 'PairedReg != AArch64…
		thegamegUnsubmitted Not Done Reply Inline Actions Right, it makes sense not to add it to `SavedRegs`. For compact unwinding we prefer storing registers in pairs to avoid encoding every single register in the unwind info, but in this case we definitely don't have support for `mostcc`. The supported registers are: x19 to x28 and d8 to d15, so in this case we will fallback on dwarf. I think this should be enough to fix it: // MachO's compact unwind format relies on all registers being stored in // pairs. // FIXME: the usual format is actually better if unwinding isn't needed. - if (produceCompactUnwindFrame(MF) && !SavedRegs.test(PairedReg)) { + if (PairedReg != AArch64::NoRegister && produceCompactUnwindFrame(MF) && + !SavedRegs.test(PairedReg)) { SavedRegs.set(PairedReg); if (AArch64::GPR64RegClass.contains(PairedReg) && !RegInfo->isReservedReg(MF, PairedReg)) thegameg: Right, it makes sense not to add it to `SavedRegs`. For compact unwinding we prefer storing…
		thegamegUnsubmitted Not Done Reply Inline Actions You were faster than me commenting on this :) Thanks! thegameg: You were faster than me commenting on this :) Thanks!
		for (unsigned i = 0; CSRegs[i]; ++i)
		if (SavedRegs.test(CSRegs[i]))
		CSStackSize += (AArch64::FPR64RegClass.contains(CSRegs[i]) \|\|
		AArch64::GPR64RegClass.contains(CSRegs[i]))
		? 8 : 16;

		// The frame record needs to be created by saving the appropriate registers
		unsigned EstimatedStackSize = MFI.estimateStackSize(MF);
		if (hasFP(MF) \|\|
		windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
		SavedRegs.set(AArch64::FP);
		SavedRegs.set(AArch64::LR);
		}

LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nUsed CSRs:";		LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nUsed CSRs:";
for (unsigned Reg		for (unsigned Reg
: SavedRegs.set_bits()) dbgs()		: SavedRegs.set_bits()) dbgs()
<< ' ' << printReg(Reg, RegInfo);		<< ' ' << printReg(Reg, RegInfo);
dbgs() << "\n";);		dbgs() << "\n";);

// If any callee-saved registers are used, the frame cannot be eliminated.		// If any callee-saved registers are used, the frame cannot be eliminated.
unsigned NumRegsSpilled = SavedRegs.count();		bool CanEliminateFrame = SavedRegs.count() == 0;
bool CanEliminateFrame = NumRegsSpilled == 0;

// The CSR spill slots have not been allocated yet, so estimateStackSize		// The CSR spill slots have not been allocated yet, so estimateStackSize
// won't include them.		// won't include them.
unsigned CFSize = MFI.estimateStackSize(MF) + 8 * NumRegsSpilled;
LLVM_DEBUG(dbgs() << "Estimated stack frame size: " << CFSize << " bytes.\n");
unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);		unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
bool BigStack = (CFSize > EstimatedStackSizeLimit);		bool BigStack = (EstimatedStackSize + CSStackSize) > EstimatedStackSizeLimit;
if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))		if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))
AFI->setHasStackFrame(true);		AFI->setHasStackFrame(true);

// Estimate if we might need to scavenge a register at some point in order		// Estimate if we might need to scavenge a register at some point in order
// to materialize a stack offset. If so, either spill one additional		// to materialize a stack offset. If so, either spill one additional
// callee-saved register or reserve a special spill slot to facilitate		// callee-saved register or reserve a special spill slot to facilitate
// register scavenging. If we already spilled an extra callee-saved register		// register scavenging. If we already spilled an extra callee-saved register
// above to keep the number of spills even, we don't need to do anything else		// above to keep the number of spills even, we don't need to do anything else
// here.		// here.
if (BigStack) {		if (BigStack) {
if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {		if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)		LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
<< " to get a scratch register.\n");		<< " to get a scratch register.\n");
SavedRegs.set(UnspilledCSGPR);		SavedRegs.set(UnspilledCSGPR);
// MachO's compact unwind format relies on all registers being stored in		// MachO's compact unwind format relies on all registers being stored in
// pairs, so if we need to spill one extra for BigStack, then we need to		// pairs, so if we need to spill one extra for BigStack, then we need to
// store the pair.		// store the pair.
if (produceCompactUnwindFrame(MF))		if (produceCompactUnwindFrame(MF))
SavedRegs.set(UnspilledCSGPRPaired);		SavedRegs.set(UnspilledCSGPRPaired);
ExtraCSSpill = UnspilledCSGPRPaired;		ExtraCSSpill = UnspilledCSGPRPaired;
NumRegsSpilled = SavedRegs.count();
}		}

// If we didn't find an extra callee-saved register to spill, create		// If we didn't find an extra callee-saved register to spill, create
// an emergency spill slot.		// an emergency spill slot.
if (!ExtraCSSpill \|\| MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {		if (!ExtraCSSpill \|\| MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
const TargetRegisterClass &RC = AArch64::GPR64RegClass;		const TargetRegisterClass &RC = AArch64::GPR64RegClass;
unsigned Size = TRI->getSpillSize(RC);		unsigned Size = TRI->getSpillSize(RC);
unsigned Align = TRI->getSpillAlignment(RC);		unsigned Align = TRI->getSpillAlignment(RC);
int FI = MFI.CreateStackObject(Size, Align, false);		int FI = MFI.CreateStackObject(Size, Align, false);
RS->addScavengingFrameIndex(FI);		RS->addScavengingFrameIndex(FI);
LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI		LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
<< " as the emergency spill slot.\n");		<< " as the emergency spill slot.\n");
}		}
}		}

		// Recalculate the size of the CSRs
		CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
		thegamegUnsubmitted Done Reply Inline Actions Why do we need to recalculate this here? Can't we just update it along the way? thegameg: Why do we need to recalculate this here? Can't we just update it along the way?
		sdesmalenAuthorUnsubmitted Not Done Reply Inline Actions No need to recalculate indeed, I've fixed it now. sdesmalen: No need to recalculate indeed, I've fixed it now.
		thegamegUnsubmitted Not Done Reply Inline Actions Could you add a comment explaining that the `8` here is not `Scale` because we only add GPRs? thegameg: Could you add a comment explaining that the `8` here is not `Scale` because we only add GPRs?
		sdesmalenAuthorUnsubmitted Not Done Reply Inline Actions yes, will do! sdesmalen: yes, will do!
		unsigned AlignedCSStackSize = alignTo(CSStackSize, 16);
		LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
		<< EstimatedStackSize + AlignedCSStackSize
		<< " bytes.\n");

// Round up to register pair alignment to avoid additional SP adjustment		// Round up to register pair alignment to avoid additional SP adjustment
// instructions.		// instructions.
AFI->setCalleeSavedStackSize(alignTo(8 * NumRegsSpilled, 16));		AFI->setCalleeSavedStackSize(AlignedCSStackSize);
		AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
}		}

bool AArch64FrameLowering::enableStackSlotScavenging(		bool AArch64FrameLowering::enableStackSlotScavenging(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
return AFI->hasCalleeSaveStackFreeSpace();		return AFI->hasCalleeSaveStackFreeSpace();
}		}

lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
assert(MF && "Invalid MachineFunction pointer.");		assert(MF && "Invalid MachineFunction pointer.");
if (MF->getFunction().getCallingConv() == CallingConv::GHC)		if (MF->getFunction().getCallingConv() == CallingConv::GHC)
// GHC set of callee saved regs is empty as all those regs are		// GHC set of callee saved regs is empty as all those regs are
// used for passing STG regs around		// used for passing STG regs around
return CSR_AArch64_NoRegs_SaveList;		return CSR_AArch64_NoRegs_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)		if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)
return CSR_AArch64_AllRegs_SaveList;		return CSR_AArch64_AllRegs_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall)		if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall)
// FIXME: default to AAPCS until we add full support.		return CSR_AArch64_AAVPCS_SaveList;
return CSR_AArch64_AAPCS_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS)		if (MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS)
return MF->getInfo<AArch64FunctionInfo>()->isSplitCSR() ?		return MF->getInfo<AArch64FunctionInfo>()->isSplitCSR() ?
CSR_AArch64_CXX_TLS_Darwin_PE_SaveList :		CSR_AArch64_CXX_TLS_Darwin_PE_SaveList :
CSR_AArch64_CXX_TLS_Darwin_SaveList;		CSR_AArch64_CXX_TLS_Darwin_SaveList;
if (MF->getSubtarget<AArch64Subtarget>().getTargetLowering()		if (MF->getSubtarget<AArch64Subtarget>().getTargetLowering()
->supportSwiftError() &&		->supportSwiftError() &&
MF->getFunction().getAttributes().hasAttrSomewhere(		MF->getFunction().getAttributes().hasAttrSomewhere(
Attribute::SwiftError))		Attribute::SwiftError))
Show All 34 Lines	if (CC == CallingConv::GHC)
// This is academic because all GHC calls are (supposed to be) tail calls		// This is academic because all GHC calls are (supposed to be) tail calls
return SCS ? CSR_AArch64_NoRegs_SCS_RegMask : CSR_AArch64_NoRegs_RegMask;		return SCS ? CSR_AArch64_NoRegs_SCS_RegMask : CSR_AArch64_NoRegs_RegMask;
if (CC == CallingConv::AnyReg)		if (CC == CallingConv::AnyReg)
return SCS ? CSR_AArch64_AllRegs_SCS_RegMask : CSR_AArch64_AllRegs_RegMask;		return SCS ? CSR_AArch64_AllRegs_SCS_RegMask : CSR_AArch64_AllRegs_RegMask;
if (CC == CallingConv::CXX_FAST_TLS)		if (CC == CallingConv::CXX_FAST_TLS)
return SCS ? CSR_AArch64_CXX_TLS_Darwin_SCS_RegMask		return SCS ? CSR_AArch64_CXX_TLS_Darwin_SCS_RegMask
: CSR_AArch64_CXX_TLS_Darwin_RegMask;		: CSR_AArch64_CXX_TLS_Darwin_RegMask;
if (CC == CallingConv::AArch64_VectorCall)		if (CC == CallingConv::AArch64_VectorCall)
// FIXME: default to AAPCS until we add full support.		return SCS ? CSR_AArch64_AAVPCS_SCS_RegMask : CSR_AArch64_AAVPCS_RegMask;
return SCS ? CSR_AArch64_AAPCS_SCS_RegMask : CSR_AArch64_AAPCS_RegMask;
if (MF.getSubtarget<AArch64Subtarget>().getTargetLowering()		if (MF.getSubtarget<AArch64Subtarget>().getTargetLowering()
->supportSwiftError() &&		->supportSwiftError() &&
MF.getFunction().getAttributes().hasAttrSomewhere(Attribute::SwiftError))		MF.getFunction().getAttributes().hasAttrSomewhere(Attribute::SwiftError))
return SCS ? CSR_AArch64_AAPCS_SwiftError_SCS_RegMask		return SCS ? CSR_AArch64_AAPCS_SwiftError_SCS_RegMask
: CSR_AArch64_AAPCS_SwiftError_RegMask;		: CSR_AArch64_AAPCS_SwiftError_RegMask;
if (CC == CallingConv::PreserveMost)		if (CC == CallingConv::PreserveMost)
return SCS ? CSR_AArch64_RT_MostRegs_SCS_RegMask		return SCS ? CSR_AArch64_RT_MostRegs_SCS_RegMask
: CSR_AArch64_RT_MostRegs_RegMask;		: CSR_AArch64_RT_MostRegs_RegMask;
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-vector-pcs.mir

This file was added.

				# RUN: llc -mtriple=aarch64-linux-gnu -run-pass=prologepilog %s -o - \| FileCheck %s

				# The tests below test the allocation of 128bit callee-saves
				# on the stack, specifically their offsets.

				# Padding of GPR64-registers is needed to ensure 16 byte alignment of
				# the stack pointer after the GPR64/FPR64 block (which is also needed
				# for the FPR128 saves when present).

				# This file also tests whether an emergency stack slot is allocated
				# when the stack frame is over a given size, caused by a series of
				# FPR128 saves. The alignment can leave a gap that can be scavenged
				# for stack slot scavenging, so it is important that the stack size
				# is properly estimated.


				--- \|

				; ModuleID = '<stdin>'
				source_filename = "<stdin>"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				; Function Attrs: nounwind
				define aarch64_vector_pcs void @test_q10_q11_x19() nounwind { entry: unreachable }

				; Function Attrs: nounwind
				define aarch64_vector_pcs void @test_q10_q11_x19_x20() nounwind { entry: unreachable }

				; Function Attrs: nounwind
				define aarch64_vector_pcs void @test_q10_q11_x19_x20_x21() nounwind { entry: unreachable }

				; Function Attrs: nounwind
				define aarch64_vector_pcs void @test_q8_to_q23_x19_to_x30() nounwind { entry: unreachable }

				; Function Attrs: nounwind
				define aarch64_vector_pcs void @test_q8_to_q23_x19_to_x30_preinc() nounwind { entry: unreachable }

				...
				---
				name: test_q10_q11_x19
				tracksRegLiveness: true
				body: \|
				bb.0.entry:
				$x19 = IMPLICIT_DEF
				$q10 = IMPLICIT_DEF
				$q11 = IMPLICIT_DEF

				; Check that the alignment gap for the 8-byte x19 is padded
				; with another 8 bytes. The CSR region will look like this:
				; +-------------------+
				; \|/////padding///////\| (8 bytes)
				; \| X19 \| (8 bytes)
				; +-------------------+ <- SP -16
				; \| Q10, Q11 \| (32 bytes)
				; +-------------------+ <- SP -48

				; CHECK-LABEL: test_q10_q11_x19{{[[:space:]]}}
				; CHECK-DAG: $sp = frame-setup STPQpre killed $q11, killed $q10, $sp, -3 :: (store 16 into %stack.[[Q11:[0-9]+]]), (store 16 into %stack.[[Q10:[0-9]+]])
				; CHECK-DAG: - { id: [[Q11]], {{.*}}, offset: -48, size: 16, alignment: 16
				; CHECK-DAG: - { id: [[Q10]], {{.*}}, offset: -32, size: 16, alignment: 16
				; CHECK-DAG: frame-setup STRXui killed $x19, $sp, 4 :: (store 8 into %stack.[[X19:[0-9]+]])
				; CHECK-DAG: - { id: [[X19]], {{.*}}, offset: -16, size: 8, alignment: 16

				...
				---
				name: test_q10_q11_x19_x20
				alignment: 2
				tracksRegLiveness: true
				body: \|
				bb.0.entry:
				$x19 = IMPLICIT_DEF
				$x20 = IMPLICIT_DEF
				$q10 = IMPLICIT_DEF
				$q11 = IMPLICIT_DEF

				; +-------------------+
				; \| X19, X20 \| (16 bytes)
				; +-------------------+ <- SP -16
				; \| Q10, Q11 \| (32 bytes)
				; +-------------------+ <- SP -48

				; CHECK-LABEL: test_q10_q11_x19_x20{{[[:space:]]}}
				; CHECK-DAG: $sp = frame-setup STPQpre killed $q11, killed $q10, $sp, -3 :: (store 16 into %stack.[[Q11:[0-9]+]]), (store 16 into %stack.[[Q10:[0-9]+]])
				; CHECK-DAG: frame-setup STPXi killed $x20, killed $x19, $sp, 4 :: (store 8 into %stack.[[X20:[0-9]+]]), (store 8 into %stack.[[X19:[0-9]+]])
				; CHECK-DAG: - { id: [[Q11]], {{.*}}, offset: -48, size: 16, alignment: 16
				; CHECK-DAG: - { id: [[Q10]], {{.*}}, offset: -32, size: 16, alignment: 16
				; CHECK-DAG: - { id: [[X20]], {{.*}}, offset: -16, size: 8, alignment: 8
				; CHECK-DAG: - { id: [[X19]], {{.*}}, offset: -8, size: 8, alignment: 8

				...
				---
				name: test_q10_q11_x19_x20_x21
				tracksRegLiveness: true
				body: \|
				bb.0.entry:
				$x19 = IMPLICIT_DEF
				$x20 = IMPLICIT_DEF
				$x21 = IMPLICIT_DEF
				$q10 = IMPLICIT_DEF
				$q11 = IMPLICIT_DEF

				; Check that the alignment gap is padded with another 8 bytes.
				; The CSR region will look like this:
				; +-------------------+
				; \| X19, X20 \| (16 bytes)
				; +-------------------+ <- SP -16
				; \|/////padding///////\| (8 bytes)
				; \| X21 \| (8 bytes)
				; +-------------------+ <- SP -32
				; \| Q10, Q11 \| (32 bytes)
				; +-------------------+ <- SP -64

				; CHECK-LABEL: test_q10_q11_x19_x20_x21
				; CHECK-DAG: $sp = frame-setup STPQpre killed $q11, killed $q10, $sp, -4 :: (store 16 into %stack.[[Q11:[0-9]+]]), (store 16 into %stack.[[Q10:[0-9]+]])
				; CHECK-DAG: frame-setup STRXui killed $x21, $sp, 4 :: (store 8 into %stack.[[X21:[0-9]+]])
				; CHECK-DAG: frame-setup STPXi killed $x20, killed $x19, $sp, 6
				; CHECK-DAG: - { id: [[Q11]], {{.*}}, offset: -64, size: 16, alignment: 16
				; CHECK-DAG: - { id: [[Q10]], {{.*}}, offset: -48, size: 16, alignment: 16
				; CHECK-DAG: - { id: [[X21]], {{.*}}, offset: -32, size: 8, alignment: 16

				...
				---
				name: test_q8_to_q23_x19_to_x30
				tracksRegLiveness: true
				body: \|
				bb.0.entry:
				$x19 = IMPLICIT_DEF
				$x20 = IMPLICIT_DEF
				$x21 = IMPLICIT_DEF
				$x22 = IMPLICIT_DEF
				$x23 = IMPLICIT_DEF
				$x24 = IMPLICIT_DEF
				$x25 = IMPLICIT_DEF
				$x26 = IMPLICIT_DEF
				$x27 = IMPLICIT_DEF
				$x28 = IMPLICIT_DEF
				$fp = IMPLICIT_DEF
				$lr = IMPLICIT_DEF
				$q8 = IMPLICIT_DEF
				$q9 = IMPLICIT_DEF
				$q10 = IMPLICIT_DEF
				$q11 = IMPLICIT_DEF
				$q12 = IMPLICIT_DEF
				$q13 = IMPLICIT_DEF
				$q14 = IMPLICIT_DEF
				$q15 = IMPLICIT_DEF
				$q16 = IMPLICIT_DEF
				$q17 = IMPLICIT_DEF
				$q18 = IMPLICIT_DEF
				$q19 = IMPLICIT_DEF
				$q20 = IMPLICIT_DEF
				$q21 = IMPLICIT_DEF
				$q22 = IMPLICIT_DEF
				$q23 = IMPLICIT_DEF

				; Test with more callee saves, which triggers 'BigStack' in
				; AArch64FrameLowering which in turn causes an emergency spill
				; slot to be allocated. The emergency spill slot is allocated
				; as close as possible to SP, so at SP + 0.
				; +-------------------+
				; \| X19..X30 \| (96 bytes)
				; +-------------------+ <- SP -96
				; \| Q8..Q23 \| (256 bytes)
				; +-------------------+ <- SP -352
				; \| emergency slot \| (16 bytes)
				; +-------------------+ <- SP -368

				; CHECK-LABEL: test_q8_to_q23_x19_to_x30
				; CHECK: $sp = frame-setup SUBXri $sp, 368, 0
				; CHECK-NEXT: frame-setup STPQi killed $q23, killed $q22, $sp, 1 :: (store 16 into %stack.{{[0-9]+}}), (store 16 into %stack.{{[0-9]+}})
				; CHECK-NEXT: frame-setup STPQi killed $q21, killed $q20, $sp, 3
				; CHECK-NEXT: frame-setup STPQi killed $q19, killed $q18, $sp, 5
				; CHECK-NEXT: frame-setup STPQi killed $q17, killed $q16, $sp, 7
				; CHECK-NEXT: frame-setup STPQi killed $q15, killed $q14, $sp, 9
				; CHECK-NEXT: frame-setup STPQi killed $q13, killed $q12, $sp, 11
				; CHECK-NEXT: frame-setup STPQi killed $q11, killed $q10, $sp, 13
				; CHECK-NEXT: frame-setup STPQi killed $q9, killed $q8, $sp, 15
				; CHECK-NEXT: frame-setup STPXi killed $x28, killed $x27, $sp, 34 :: (store 8 into %stack.{{[0-9]+}}), (store 8 into %stack.{{[0-9]+}})
				; CHECK-NEXT: frame-setup STPXi killed $x26, killed $x25, $sp, 36
				; CHECK-NEXT: frame-setup STPXi killed $x24, killed $x23, $sp, 38
				; CHECK-NEXT: frame-setup STPXi killed $x22, killed $x21, $sp, 40
				; CHECK-NEXT: frame-setup STPXi killed $x20, killed $x19, $sp, 42
				; CHECK-NEXT: frame-setup STPXi killed $fp, killed $lr, $sp, 44

				...
				---
				name: test_q8_to_q23_x19_to_x30_preinc
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 160, alignment: 4, local-offset: 0 }
				constants:
				body: \|
				bb.0.entry:
				$x19 = IMPLICIT_DEF
				$x20 = IMPLICIT_DEF
				$x21 = IMPLICIT_DEF
				$x22 = IMPLICIT_DEF
				$x23 = IMPLICIT_DEF
				$x24 = IMPLICIT_DEF
				$x25 = IMPLICIT_DEF
				$x26 = IMPLICIT_DEF
				$x27 = IMPLICIT_DEF
				$x28 = IMPLICIT_DEF
				$fp = IMPLICIT_DEF
				$lr = IMPLICIT_DEF
				$q8 = IMPLICIT_DEF
				$q9 = IMPLICIT_DEF
				$q10 = IMPLICIT_DEF
				$q11 = IMPLICIT_DEF
				$q12 = IMPLICIT_DEF
				$q13 = IMPLICIT_DEF
				$q14 = IMPLICIT_DEF
				$q15 = IMPLICIT_DEF
				$q16 = IMPLICIT_DEF
				$q17 = IMPLICIT_DEF
				$q18 = IMPLICIT_DEF
				$q19 = IMPLICIT_DEF
				$q20 = IMPLICIT_DEF
				$q21 = IMPLICIT_DEF
				$q22 = IMPLICIT_DEF
				$q23 = IMPLICIT_DEF

				; When the total stack size >= 512, it will use the pre-increment
				; rather than the 'sub sp, sp, <size>'.
				; +-------------------+
				; \| X19..X30 \| (96 bytes)
				; +-------------------+ <- SP -96
				; \| Q8..Q23 \| (256 bytes)
				; +-------------------+ <- SP -352
				; \| 'obj' \| (32 bytes)
				; +-------------------+ <- SP -384
				; \| emergency slot \| (16 bytes)
				; +-------------------+ <- SP -400

				; CHECK-LABEL: test_q8_to_q23_x19_to_x30_preinc
				; CHECK: $sp = frame-setup STPQpre killed $q23, killed $q22, $sp, -22 :: (store 16 into %stack.{{[0-9]+}}), (store 16 into %stack.{{[0-9]+}})
				; CHECK-NEXT: frame-setup STPQi killed $q21, killed $q20, $sp, 2
				; CHECK-NEXT: frame-setup STPQi killed $q19, killed $q18, $sp, 4
				; CHECK-NEXT: frame-setup STPQi killed $q17, killed $q16, $sp, 6
				; CHECK-NEXT: frame-setup STPQi killed $q15, killed $q14, $sp, 8
				; CHECK-NEXT: frame-setup STPQi killed $q13, killed $q12, $sp, 10
				; CHECK-NEXT: frame-setup STPQi killed $q11, killed $q10, $sp, 12
				; CHECK-NEXT: frame-setup STPQi killed $q9, killed $q8, $sp, 14
				; CHECK-NEXT: frame-setup STPXi killed $x28, killed $x27, $sp, 32 :: (store 8 into %stack.{{[0-9]+}}), (store 8 into %stack.{{[0-9]+}})
				; CHECK-NEXT: frame-setup STPXi killed $x26, killed $x25, $sp, 34
				; CHECK-NEXT: frame-setup STPXi killed $x24, killed $x23, $sp, 36
				; CHECK-NEXT: frame-setup STPXi killed $x22, killed $x21, $sp, 38
				; CHECK-NEXT: frame-setup STPXi killed $x20, killed $x19, $sp, 40
				; CHECK-NEXT: frame-setup STPXi killed $fp, killed $lr, $sp, 42
				; CHECK-NEXT: $sp = frame-setup SUBXri $sp, 176, 0

				...

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement aarch64_vector_pcs codegen support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 163493

lib/Target/AArch64/AArch64CallingConvention.td

lib/Target/AArch64/AArch64FrameLowering.cpp

lib/Target/AArch64/AArch64RegisterInfo.cpp

test/CodeGen/AArch64/aarch64-vector-pcs.mir

[AArch64] Implement aarch64_vector_pcs codegen support.
ClosedPublic