This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
3/32
PPCFrameLowering.cpp
-
PPCRegisterInfo.h
1/9
PPCRegisterInfo.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
MCSE-caller-preserved-reg.ll
-
ppc-shrink-wrapping.ll
-
tls_get_addr_clobbers.ll

Differential D42590

[PowerPC] Try to move the stack pointer update instruction later in the prologue and earlier in the epilogue (Version 2)
ClosedPublic

Authored by stefanp on Jan 26 2018, 11:02 AM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
inouehrs
sfertile
lei
syzaara
jtony
hfinkel
echristo

Commits

rGbd5429ef38de: [PowerPC] Move the stack pointer update instruction later in the prologue and…
rL355085: [PowerPC] Move the stack pointer update instruction later in the prologue and…

Summary

This is the second attempt to move the stdu instruction in the prologue and epilogue.
The first attempt was D41737 but that contained a bug that was exposed by "make test" in libvpx.

In order to fix that issue the transformation is turned off for functions that require frame index scavenging. In order to determine that information the requiresFrameIndexScavenging had to be implemented for PowerPC.

Diff Detail

Event Timeline

stefanp created this revision.Jan 26 2018, 11:02 AM

lei added inline comments.Jan 30 2018, 8:57 AM

lib/Target/PowerPC/PPCRegisterInfo.cpp
365	maybe an early exit here instead... `if (FrIdx >=0) continue;`

syzaara added inline comments.Jan 30 2018, 11:09 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
842	form -> from
852	Can use a range based for loop.
1397	Can use range based for loop here as well.

Fixed the issues mentioned in previous comments.

stefanp marked 4 inline comments as done.Jan 31 2018, 1:40 PM

nemanjai added inline comments.Feb 1 2018, 3:46 PM

lib/Target/PowerPC/PPCRegisterInfo.cpp
362	We use spaces around binary/assignment operators. Maybe just clang-format-diff the patch.
369	I'm not a fan of this solution. It provides yet another place we check for the register class for a physical register without a clear explanation for why we care about the register class. I'd much prefer a unified solution between `StoreRegToStackSlot()`, `isStoreToStackSlot()` and `requiresFrameIndexScavenging()`. What I'm thinking is something along the lines of: static const unsigned OpcodesForSpills[] = { PPC::STD, PPC::STW, ... }; PPCInstrInfo::getOpcodeForSpill(unsigned Reg, const TargetRegisterClass *RC = nullptr); That way we'd have a single definitive list of opcodes that are used for spilling registers and wouldn't have to keep this delicate dance of keeping multiple functions in sync. `isStoreToStackSlot()` would just check the array to see if its opcode is in there `StoreRegToStackSlot()` and `requiresFrameIndexScavenging()` would use `getOpcodeForSpill()` with the register class or physical register respectively `getOpcodeForSpill()` would just compute the index into the array based on the register class and target features and return the respective element Of course, it doesn't have to be done that way, but any solution that would unify this would be good.

nemanjai added inline comments.Feb 2 2018, 5:51 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
857	I imagine this is impossible, but it may not be a terrible idea to assert that you haven't somehow iterated past a terminator.
1388	There are a few places where there are extra spaces in comments. But I'm sure that it'll all get fixed up when you run clang-format-diff.
1399	This probably applies both here and above, but doesn't the condition `FrIdx > 0` actually mean we should not do this at all? I'm not sure if it's possible, but what if you had frame indices that are say { negative, negative, positive, negative } as you iterate? Wouldn't you want to leave the function alone then? Also, what is it about `FrIdx == 0` that prevents you from moving the stack pointer update up past it? For that matter, there should be a comment why we require the condition at all. Finally, please add a comment as to why it is not necessary to manually update the offsets for the CSR spills and restores - we just move the stack ptr update and something else figures out the offsets.

After some code cleanup that was required for requiresFrameIndexScavenging here is the new version of this patch.
It should take into consideration all of the reviewer's comments.

Overall I'm very happy with how this patch looks now. The cleanup really allowed this patch to flow much more clearly. However, I think you still haven't addressed the comment https://reviews.llvm.org/D42590#inline-374441

lib/Target/PowerPC/PPCRegisterInfo.cpp
321	Indices.

inouehrs added inline comments.Apr 12 2018, 10:17 PM

lib/Target/PowerPC/PPCFrameLowering.cpp
1552	If the latency of MTLR is the critical path in the epilogue, can we move this load before stack pointer update (with adjusted offset) to hide the latency further? (But this can be a separate patch.)

stefanp added inline comments.Apr 16 2018, 10:41 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
1552	That's a good point. When I did the original performance tests I had moved the the mtlr past the callee restores and not all the way past the stack pointer update. I'm going to update this patch without that change and then I'll put in another patch with that change alone.

I've updated the negative frame indices to use isFixedObjectIndex which is cleaner.

lei added inline comments.Apr 26 2018, 1:32 PM

lib/Target/PowerPC/PPCFrameLowering.cpp
855	This `if` can either be merged into the one below... `if (FrIdx <0 && MFI.isFixedObjectIndex ...)` or do an early exit of this loop iteration `if (FrIdx >=0) continue;`

lei added inline comments.Apr 26 2018, 1:45 PM

lib/Target/PowerPC/PPCFrameLowering.cpp
1388	I don't think you addressed this issue with the extra spaces in comments. From what I can see we don't put an extra space after `//` when continuing a comment from the previous line.
1399	merge this if statement with the nested if

Address comments from previous review.

inouehrs added inline comments.Jun 7 2018, 1:45 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
1239	Since we have only one `MovingStackUpdateDown`, all CSIs are modified back even only a part of them were update above.

nemanjai added inline comments.Aug 14 2018, 10:23 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
1239	@stefanp Can you respond to this comment? Is this a problem? Can we get into a situation where we end up with incorrect offsets if not all of the callee-saved spills/restores have been moved/updated?

I'm sorry it took so long for me to look at this review.

I think you are correct Hiroshi in that what I was doing is not safe. I've added a check where the operation is aborted if not all of the Callee Saved Info is updated at the same time.
Also added a couple of missing lines to getStoreOpcodeForSpill and getLoadOpcodeForSpill that was missed when the Signal Processing Engine work was done. This was exposed by this patch.

Herald added a subscriber: jsji. · View Herald TranscriptOct 10 2018, 1:08 PM

I think this is close to ready but there are a few comments that have to be addressed. Also, @hfinkel and @inouehrs do you see anything that needs to be handled in addition to what I commented on?

lib/Target/PowerPC/PPCFrameLowering.cpp
846	No need to make the reader work it out. Add something like this to the comment: (i.e. a frame size of 32768 bytes satisfies isLargeFrame here, but not in the epilogue inserter).
847	Why do we need this? How come we can't just use `RegInfo` defined above? It should be the same object should it not?
855	This needs a comment explaining why we need this check and also why failing this check with one of the Callee Saved Register spills doesn't require us to abort the operation the way failing the subsequent check does.
1394	The condition here looks like a repeat of the one in the prologue inserter. Can we actually extract this into a function that can be queried by both? Then this can be changed to something like: `if (stackUpdateCanBeMoved())` or something equivalently simple.
1399	Same comment as a similar check in the prologue inserter.
lib/Target/PowerPC/PPCRegisterInfo.cpp
310	"saved info" doesn't really mean anything in this context. The CalleeSavedInfo type is a structure containing information about Callee Saved Registers.
314	Same as above.
317	Formatting.
330	I am not sure I really follow when we need FI scavenging... I was under the impression that we might need it if: The frame size is too large so we need to use an X-Form store/load for the spill/restore The alignment of the spill/restore is lower than what is required by the D-Form (4 for DS-Form, 16 for DQ-Form) The only opcode for the spill/restore is an X-Form This if statement certainly seems to accomplish the last of those but I don't see anything that accomplishes the other two. The first can certainly be an early exit before the loop.

This revision now requires changes to proceed.Dec 29 2018, 3:17 PM

hfinkel added inline comments.Dec 30 2018, 8:58 AM

lib/Target/PowerPC/PPCFrameLowering.cpp
846	I think this is restricted to frames where the relevant saves (i.e., whatever is stored before the stack-pointer update is performed) must be less than the red-zone size, which at least under ELF v2 is 288 bytes (see ABI spec: 2.2.2.4 Protected Zone). Maybe this is handled below somehow, but so long as we have a comment here explaining everything, we should explain how this factors in as well. It is, however, not clear to me where you are checking this below. If you're not, I'm pretty sure that's a significant problem. If we're saving more than just GPRs, we can easily exceed 288 bytes. An explicit test case this would also be a very-good idea.

nemanjai added inline comments.Dec 30 2018, 2:52 PM

lib/Target/PowerPC/PPCFrameLowering.cpp
846	We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the original code) but into the current stack frame. This just moves the stack pointer update instruction around. However, the idea is to save into the same stack frame/slot by adjusting the offset in each spill to account for how much the stack pointer will be updated by. Namely, change something like (the actual values have nothing to do with reality, just an illustration): stdu 1, -48(1) std 29, 16(1) std 30, 8(1) to something like std 29, -32(1) std 30, -40(1) stdu 1, -48(1) That way all the stores can be dispatched in parallel rather than the CSR spills for R29 and R30 waiting for the update to R1 to actually complete. But we are still saving everything to the exact same memory location. But I agree that we should add further testing to clearly illustrate such a change in behaviour.

hfinkel added inline comments.Dec 30 2018, 5:21 PM

lib/Target/PowerPC/PPCFrameLowering.cpp
846	We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the original code) but into the current stack frame. But we are, as you've illustrated with your example... stdu 1, -48(1) std 29, 16(1) std 30, 8(1) here we adjust the stack pointer first, and all stores are to addresses above the stack pointer. std 29, -32(1) std 30, -40(1) stdu 1, -48(1) here we adjust it afterward, and so the first two stores are below the current stack pointer. That's into the red zone. It becomes not the red zone only after you adjust the stack pointer. But, if we have: std 29, -332(1) std 30, -340(1) ; interrupt handler runs here! stdu 1, -348(1) (I made the numbers bigger to illustrate the issue) then we have a potential problem. The FP save area can be up to 256 bytes, and then comes the GPR area. Then the vector area (which can be up to 512 bytes). Thus, we can certainly go beyond 288 bytes.

Addressed reviewer comments.
Limited optimization to cases where the fixed called saved regs fit in the red zone of 288 bytes.

Some quick notes.

lib/Target/PowerPC/PPCFrameLowering.cpp
702	Can we use getRedZoneSize instead of hardcoded 288 here?
706	We should check ABI before checking framesize? As the red zone size is different for different ABIs. eg: DarwinABI has a 224-byte red zone. PPC32 SVR4ABI(Non-DarwinABI) has no red zone and PPC64 SVR4ABI has a 288-byte red zone.

Addressed comments from Jinsong.

I've removed the magic number and I'm using getRedZoneSize() now.
I've also moved the isPPC64() check up as well as adding an isELFv2ABI() check.

Please add the following test cases:

A stack frame that is just a bit larger than required
A stack frame with a CSR save that is not a fixed object
A stack frame where we have to use scavenging
A test case where everything fits and it saves multiple register types (GPR, FPR, VSR)

I would recommend that you produce the CHECK directives using the script in utils/ and commit it first as an NFC patch so that this patch clearly shows the differences in the code emitted.

lib/Target/PowerPC/PPCFrameLowering.cpp
486	This seems to duplicate code. Can we not implement the update version in terms of this function? Seems that the only thing that slightly complicates things is the computation of `maxCallFrameSize` so the update form should probably have that as a pointer output parameter with a `nullptr` default.
693	Combine the subtarget feature check early exits.
717	s/mode/move
725	It is not clear why? I presume because the scavenger can add further spills? In any case, the comment should mention why. Also: `return !RegInfo->requiresFrameIndexScavenging(MF);`
844	Won't we be broken if we do this to some of the CSR's and then we bail because one of them doesn't satisfy this condition below? It seems that we need to either defer any offset updates until we are sure we're moving the stack ptr update or keep track of any we updated so we can backtrack if we need to.
lib/Target/PowerPC/PPCFrameLowering.h
78 ↗	(On Diff #183590)	`callee saved instructions` sounds odd. Perhaps `CSR save/restore code`?
lib/Target/PowerPC/PPCRegisterInfo.cpp
318	I think a constant such as `0x7FFF` communicates intent more clearly. Furthermore, a condition such as this may clearly illustrate that we can't have any higher bits set: `if (FrameSize & ~0x7FFF)`

Committed revision 353994 with the test cases as NFC.
Will update the patch for the remaining comments soon...

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 13 2019, 3:50 PM

Fixed according to review comments.

LGTM with the minor comments addressed on the commit.

Looking at the test case has actually made me think of something else that might be worthwhile following up on...
Namely, I think there would be general benefit of breaking the dependence of the link register restore from the stack pointer update when we are not moving the stack pointer update.

For example, the following change might be a significant improvement in the epilogue:

CSR restores
addi 1, 1, <Value>
ld 0, 16(1)
mtlr 0

CSR restores
ld 0, <Value> + 16(1)
mtlr 0
addi 1, 1, <Value>

This would allow the scheduler to schedule the ld->mtlr sequence for restoring the link register in a way that the CSR restores hide the latency of the sequence. Furthermore, that should be a safe transformation regardless of the size of the stack frame since any interrupt handlers that run during the epilogue of a function must restore the LR when resuming control.

lib/Target/PowerPC/PPCFrameLowering.cpp
700	s/until the stack pointer is moved/until the stack pointer is updated
713	s/mode/move Also, a comment that quite literally just says in words what the code clearly says isn't useful. It is more useful to say why we need the check. Perhaps something like: // Calls to fast_cc functions use different rules for passing parameters on // the stack from the ABI and using PIC base in the function imposes // similar restrictions to using the base pointer. It is not generally safe // to move the stack pointer updatein these situations. It might be reasonable to also combine these into a single condition, but this is completely a matter of personal preference.
1398	s/stack update pointer/update of the stack pointer
1405	`// Abort the operation as we can't update all CSR restores.`

This revision is now accepted and ready to land.Feb 27 2019, 5:01 AM

Closed by commit rL355085: [PowerPC] Move the stack pointer update instruction later in the prologue and… (authored by stefanp). · Explain WhyFeb 28 2019, 4:22 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2019, 4:22 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lkail mentioned this in D97455: [PowerPC][AIX] Enable moving stack update for AIX.Feb 25 2021, 2:26 AM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCFrameLowering.cpp

76 lines

PPCRegisterInfo.h

4 lines

PPCRegisterInfo.cpp

31 lines

test/

CodeGen/

PowerPC/

MCSE-caller-preserved-reg.ll

10 lines

ppc-shrink-wrapping.ll

16 lines

tls_get_addr_clobbers.ll

18 lines

Diff 142663

lib/Target/PowerPC/PPCFrameLowering.cpp

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
if (UpdateMF)		if (UpdateMF)
MFI.setStackSize(FrameSize);		MFI.setStackSize(FrameSize);

return FrameSize;		return FrameSize;
}		}

// hasFP - Return true if the specified function actually has a dedicated frame		// hasFP - Return true if the specified function actually has a dedicated frame
// pointer register.		// pointer register.
bool PPCFrameLowering::hasFP(const MachineFunction &MF) const {		bool PPCFrameLowering::hasFP(const MachineFunction &MF) const {
		nemanjaiUnsubmitted Not Done Reply Inline Actions This seems to duplicate code. Can we not implement the update version in terms of this function? Seems that the only thing that slightly complicates things is the computation of `maxCallFrameSize` so the update form should probably have that as a pointer output parameter with a `nullptr` default. nemanjai: This seems to duplicate code. Can we not implement the update version in terms of this function?
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
// FIXME: This is pretty much broken by design: hasFP() might be called really		// FIXME: This is pretty much broken by design: hasFP() might be called really
// early, before the stack layout was calculated and thus hasFP() might return		// early, before the stack layout was calculated and thus hasFP() might return
// true or false here depending on the time of call.		// true or false here depending on the time of call.
return (MFI.getStackSize()) && needsFP(MF);		return (MFI.getStackSize()) && needsFP(MF);
}		}

// needsFP - Return true if the specified function should have a dedicated frame		// needsFP - Return true if the specified function should have a dedicated frame
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	void PPCFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
const PPCInstrInfo &TII = *Subtarget.getInstrInfo();		const PPCInstrInfo &TII = *Subtarget.getInstrInfo();
const PPCRegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const PPCRegisterInfo *RegInfo = Subtarget.getRegisterInfo();

MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();		const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
DebugLoc dl;		DebugLoc dl;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Combine the subtarget feature check early exits. nemanjai: Combine the subtarget feature check early exits.
bool needsCFI = MMI.hasDebugInfo() \|\|		bool needsCFI = MMI.hasDebugInfo() \|\|
MF.getFunction().needsUnwindTableEntry();		MF.getFunction().needsUnwindTableEntry();

// Get processor type.		// Get processor type.
bool isPPC64 = Subtarget.isPPC64();		bool isPPC64 = Subtarget.isPPC64();
// Get the ABI.		// Get the ABI.
bool isSVR4ABI = Subtarget.isSVR4ABI();		bool isSVR4ABI = Subtarget.isSVR4ABI();
		nemanjaiUnsubmitted Not Done Reply Inline Actions s/until the stack pointer is moved/until the stack pointer is updated nemanjai: s/until the stack pointer is moved/until the stack pointer is updated
bool isELFv2ABI = Subtarget.isELFv2ABI();		bool isELFv2ABI = Subtarget.isELFv2ABI();
assert((Subtarget.isDarwinABI() \|\| isSVR4ABI) &&		assert((Subtarget.isDarwinABI() \|\| isSVR4ABI) &&
		jsjiUnsubmitted Not Done Reply Inline Actions Can we use getRedZoneSize instead of hardcoded 288 here? jsji: Can we use getRedZoneSize instead of hardcoded 288 here?
"Currently only Darwin and SVR4 ABIs are supported for PowerPC.");		"Currently only Darwin and SVR4 ABIs are supported for PowerPC.");

// Scan the prolog, looking for an UPDATE_VRSAVE instruction. If we find it,		// Scan the prolog, looking for an UPDATE_VRSAVE instruction. If we find it,
// process it.		// process it.
		jsjiUnsubmitted Not Done Reply Inline Actions We should check ABI before checking framesize? As the red zone size is different for different ABIs. eg: DarwinABI has a 224-byte red zone. PPC32 SVR4ABI(Non-DarwinABI) has no red zone and PPC64 SVR4ABI has a 288-byte red zone. jsji: We should check ABI before checking framesize? As the red zone size is different for different…
if (!isSVR4ABI)		if (!isSVR4ABI)
for (unsigned i = 0; MBBI != MBB.end(); ++i, ++MBBI) {		for (unsigned i = 0; MBBI != MBB.end(); ++i, ++MBBI) {
if (MBBI->getOpcode() == PPC::UPDATE_VRSAVE) {		if (MBBI->getOpcode() == PPC::UPDATE_VRSAVE) {
HandleVRSaveUpdate(*MBBI, TII);		HandleVRSaveUpdate(*MBBI, TII);
break;		break;
}		}
}		}
		nemanjaiUnsubmitted Not Done Reply Inline Actions s/mode/move Also, a comment that quite literally just says in words what the code clearly says isn't useful. It is more useful to say why we need the check. Perhaps something like: // Calls to fast_cc functions use different rules for passing parameters on // the stack from the ABI and using PIC base in the function imposes // similar restrictions to using the base pointer. It is not generally safe // to move the stack pointer updatein these situations. It might be reasonable to also combine these into a single condition, but this is completely a matter of personal preference. nemanjai: s/mode/move Also, a comment that quite literally just says in words what the code clearly says…

// Move MBBI back to the beginning of the prologue block.		// Move MBBI back to the beginning of the prologue block.
MBBI = MBB.begin();		MBBI = MBB.begin();

		nemanjaiUnsubmitted Not Done Reply Inline Actions s/mode/move nemanjai: s/mode/move
// Work out frame sizes.		// Work out frame sizes.
unsigned FrameSize = determineFrameLayout(MF);		unsigned FrameSize = determineFrameLayout(MF);
int NegFrameSize = -FrameSize;		int NegFrameSize = -FrameSize;
if (!isInt<32>(NegFrameSize))		if (!isInt<32>(NegFrameSize))
llvm_unreachable("Unhandled stack size!");		llvm_unreachable("Unhandled stack size!");

if (MFI.isFrameAddressTaken())		if (MFI.isFrameAddressTaken())
replaceFPWithRealFP(MF);		replaceFPWithRealFP(MF);
		nemanjaiUnsubmitted Not Done Reply Inline Actions It is not clear why? I presume because the scavenger can add further spills? In any case, the comment should mention why. Also: `return !RegInfo->requiresFrameIndexScavenging(MF);` nemanjai: It is not clear why? I presume because the scavenger can add further spills? In any case, the…

// Check if the link register (LR) must be saved.		// Check if the link register (LR) must be saved.
PPCFunctionInfo *FI = MF.getInfo<PPCFunctionInfo>();		PPCFunctionInfo *FI = MF.getInfo<PPCFunctionInfo>();
bool MustSaveLR = FI->mustSaveLR();		bool MustSaveLR = FI->mustSaveLR();
const SmallVectorImpl<unsigned> &MustSaveCRs = FI->getMustSaveCRs();		const SmallVectorImpl<unsigned> &MustSaveCRs = FI->getMustSaveCRs();
bool MustSaveCR = !MustSaveCRs.empty();		bool MustSaveCR = !MustSaveCRs.empty();
// Do we have a frame pointer and/or base pointer for this function?		// Do we have a frame pointer and/or base pointer for this function?
bool HasFP = hasFP(MF);		bool HasFP = hasFP(MF);
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	void PPCFrameLowering::emitPrologue(MachineFunction &MF,

// Frames of 32KB & larger require special handling because they cannot be		// Frames of 32KB & larger require special handling because they cannot be
// indexed into with a simple STDU/STWU/STD/STW immediate offset operand.		// indexed into with a simple STDU/STWU/STD/STW immediate offset operand.
bool isLargeFrame = !isInt<16>(NegFrameSize);		bool isLargeFrame = !isInt<16>(NegFrameSize);

assert((isPPC64 \|\| !MustSaveCR) &&		assert((isPPC64 \|\| !MustSaveCR) &&
"Prologue CR saving supported only in 64-bit mode");		"Prologue CR saving supported only in 64-bit mode");

		// Check if we can move the stack update instruction (stdu) down the prologue
		// past the callee saves. Hopefully this will avoid the situation where the
		// saves are waiting for the update on the store with update to complete.
		MachineBasicBlock::iterator StackUpdateLoc = MBBI;
		bool MovingStackUpdateDown = false;
		// This optimization has a number of guards. At this point we are being very
		// cautious and we do not try to do this when we have a fast call or
		// we are using PIC base or we are using a frame pointer or a base pointer.
		// It would be possible to turn on this optimization under these conditions
		// as well but it would require further modifications to the prologue and
		// epilogue. For example, if we want to turn on this optimization for
		// functions that use frame pointers we would have to take into consideration
		// the fact that spills to the stack may be using r30 instead of r1.
		// If the frame index requires scavenging there is the possibility that we may
		// require a spill in the prologue in which case it is unsafe to move the
		// stack pointer update.
		// Aside from that we need to have a non-zero frame and we need to have a
		syzaaraUnsubmitted Done Reply Inline Actions form -> from syzaara: form -> from
		// non-large frame size. Notice that we did not use !isLargeFrame but we used
		// isInt<16>(FrameSize) instead. This is important because this guard has to
		nemanjaiUnsubmitted Not Done Reply Inline Actions Won't we be broken if we do this to some of the CSR's and then we bail because one of them doesn't satisfy this condition below? It seems that we need to either defer any offset updates until we are sure we're moving the stack ptr update or keep track of any we updated so we can backtrack if we need to. nemanjai: Won't we be broken if we do this to some of the CSR's and then we bail because one of them…
		// be identical to the one in the epilogue and in the epilogue the variable
		// is defined as bool isLargeFrame = !isInt<16>(FrameSize);
		nemanjaiUnsubmitted Not Done Reply Inline Actions No need to make the reader work it out. Add something like this to the comment: (i.e. a frame size of 32768 bytes satisfies isLargeFrame here, but not in the epilogue inserter). nemanjai: No need to make the reader work it out. Add something like this to the comment: ``` (i.e. a…
		hfinkelUnsubmitted Not Done Reply Inline Actions I think this is restricted to frames where the relevant saves (i.e., whatever is stored before the stack-pointer update is performed) must be less than the red-zone size, which at least under ELF v2 is 288 bytes (see ABI spec: 2.2.2.4 Protected Zone). Maybe this is handled below somehow, but so long as we have a comment here explaining everything, we should explain how this factors in as well. It is, however, not clear to me where you are checking this below. If you're not, I'm pretty sure that's a significant problem. If we're saving more than just GPRs, we can easily exceed 288 bytes. An explicit test case this would also be a very-good idea. hfinkel: I think this is restricted to frames where the relevant saves (i.e., whatever is stored before…
		nemanjaiUnsubmitted Not Done Reply Inline Actions We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the original code) but into the current stack frame. This just moves the stack pointer update instruction around. However, the idea is to save into the same stack frame/slot by adjusting the offset in each spill to account for how much the stack pointer will be updated by. Namely, change something like (the actual values have nothing to do with reality, just an illustration): stdu 1, -48(1) std 29, 16(1) std 30, 8(1) to something like std 29, -32(1) std 30, -40(1) stdu 1, -48(1) That way all the stores can be dispatched in parallel rather than the CSR spills for R29 and R30 waiting for the update to R1 to actually complete. But we are still saving everything to the exact same memory location. But I agree that we should add further testing to clearly illustrate such a change in behaviour. nemanjai: We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the…
		hfinkelUnsubmitted Not Done Reply Inline Actions We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the original code) but into the current stack frame. But we are, as you've illustrated with your example... stdu 1, -48(1) std 29, 16(1) std 30, 8(1) here we adjust the stack pointer first, and all stores are to addresses above the stack pointer. std 29, -32(1) std 30, -40(1) stdu 1, -48(1) here we adjust it afterward, and so the first two stores are below the current stack pointer. That's into the red zone. It becomes not the red zone only after you adjust the stack pointer. But, if we have: std 29, -332(1) std 30, -340(1) ; interrupt handler runs here! stdu 1, -348(1) (I made the numbers bigger to illustrate the issue) then we have a potential problem. The FP save area can be up to 256 bytes, and then comes the GPR area. Then the vector area (which can be up to 512 bytes). Thus, we can certainly go beyond 288 bytes. hfinkel: > We are not actually spilling into the Red Zone here (or at least not if we wouldn't in the…
		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
		nemanjaiUnsubmitted Not Done Reply Inline Actions Why do we need this? How come we can't just use `RegInfo` defined above? It should be the same object should it not? nemanjai: Why do we need this? How come we can't just use `RegInfo` defined above? It should be the same…
		bool RequiresScavenging = TRI->requiresFrameIndexScavenging(MF);
		if (FrameSize && !FI->hasFastCall() && !FI->usesPICBase() && !HasFP &&
		!HasBP && isInt<16>(FrameSize) && !RequiresScavenging &&
		Subtarget.isPPC64()) {
		const std::vector<CalleeSavedInfo> &Info = MFI.getCalleeSavedInfo();
		syzaaraUnsubmitted Done Reply Inline Actions Can use a range based for loop. syzaara: Can use a range based for loop.
		for (CalleeSavedInfo CSI : Info) {
		int FrIdx = CSI.getFrameIdx();
		if (FrIdx < 0) {
		leiUnsubmitted Not Done Reply Inline Actions This `if` can either be merged into the one below... `if (FrIdx <0 && MFI.isFixedObjectIndex ...)` or do an early exit of this loop iteration `if (FrIdx >=0) continue;` lei: This `if` can either be merged into the one below... `if (FrIdx <0 && MFI.isFixedObjectIndex ...
		nemanjaiUnsubmitted Not Done Reply Inline Actions This needs a comment explaining why we need this check and also why failing this check with one of the Callee Saved Register spills doesn't require us to abort the operation the way failing the subsequent check does. nemanjai: This needs a comment explaining why we need this check and also why failing this check with one…
		if (MFI.isFixedObjectIndex(FrIdx) && MFI.getObjectOffset(FrIdx) < 0) {
		MFI.setObjectOffset(FrIdx, MFI.getObjectOffset(FrIdx) + NegFrameSize);
		nemanjaiUnsubmitted Not Done Reply Inline Actions I imagine this is impossible, but it may not be a terrible idea to assert that you haven't somehow iterated past a terminator. nemanjai: I imagine this is impossible, but it may not be a terrible idea to assert that you haven't…
		StackUpdateLoc++;
		MovingStackUpdateDown = true;
		}
		}
		}
		}

// If we need to spill the CR and the LR but we don't have two separate		// If we need to spill the CR and the LR but we don't have two separate
// registers available, we must spill them one at a time		// registers available, we must spill them one at a time
if (MustSaveCR && SingleScratchReg && MustSaveLR) {		if (MustSaveCR && SingleScratchReg && MustSaveLR) {
// In the ELFv2 ABI, we are not required to save all CR fields.		// In the ELFv2 ABI, we are not required to save all CR fields.
// If only one or two CR fields are clobbered, it is more efficient to use		// If only one or two CR fields are clobbered, it is more efficient to use
// mfocrf to selectively save just those fields, because mfocrf has short		// mfocrf to selectively save just those fields, because mfocrf has short
// latency compares to mfcr.		// latency compares to mfcr.
unsigned MfcrOpcode = PPC::MFCR8;		unsigned MfcrOpcode = PPC::MFCR8;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (HasRedZone) {
if (HasBP)		if (HasBP)
BuildMI(MBB, MBBI, dl, StoreInst)		BuildMI(MBB, MBBI, dl, StoreInst)
.addReg(BPReg)		.addReg(BPReg)
.addImm(BPOffset)		.addImm(BPOffset)
.addReg(SPReg);		.addReg(SPReg);
}		}

if (MustSaveLR)		if (MustSaveLR)
BuildMI(MBB, MBBI, dl, StoreInst)		BuildMI(MBB, StackUpdateLoc, dl, StoreInst)
.addReg(ScratchReg, getKillRegState(true))		.addReg(ScratchReg, getKillRegState(true))
.addImm(LROffset)		.addImm(LROffset)
.addReg(SPReg);		.addReg(SPReg);

if (MustSaveCR &&		if (MustSaveCR &&
!(SingleScratchReg && MustSaveLR)) { // will only occur for PPC64		!(SingleScratchReg && MustSaveLR)) { // will only occur for PPC64
assert(HasRedZone && "A red zone is always available on PPC64");		assert(HasRedZone && "A red zone is always available on PPC64");
BuildMI(MBB, MBBI, dl, TII.get(PPC::STW8))		BuildMI(MBB, MBBI, dl, TII.get(PPC::STW8))
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (HasBP && MaxAlign > 1) {

BuildMI(MBB, MBBI, dl, StoreUpdtIdxInst, SPReg)		BuildMI(MBB, MBBI, dl, StoreUpdtIdxInst, SPReg)
.addReg(SPReg, RegState::Kill)		.addReg(SPReg, RegState::Kill)
.addReg(SPReg)		.addReg(SPReg)
.addReg(ScratchReg);		.addReg(ScratchReg);
HasSTUX = true;		HasSTUX = true;

} else if (!isLargeFrame) {		} else if (!isLargeFrame) {
BuildMI(MBB, MBBI, dl, StoreUpdtInst, SPReg)		BuildMI(MBB, StackUpdateLoc, dl, StoreUpdtInst, SPReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(NegFrameSize)		.addImm(NegFrameSize)
.addReg(SPReg);		.addReg(SPReg);

} else {		} else {
BuildMI(MBB, MBBI, dl, LoadImmShiftedInst, ScratchReg)		BuildMI(MBB, MBBI, dl, LoadImmShiftedInst, ScratchReg)
.addImm(NegFrameSize >> 16);		.addImm(NegFrameSize >> 16);
BuildMI(MBB, MBBI, dl, OrImmInst, ScratchReg)		BuildMI(MBB, MBBI, dl, OrImmInst, ScratchReg)
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = CSI.size(); I != E; ++I) {
unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(		unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
nullptr, MRI->getDwarfRegNum(CRReg, true), 8));		nullptr, MRI->getDwarfRegNum(CRReg, true), 8));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);		.addCFIIndex(CFIIndex);
continue;		continue;
}		}

int Offset = MFI.getObjectOffset(CSI[I].getFrameIdx());		int Offset = MFI.getObjectOffset(CSI[I].getFrameIdx());
		// We have changed the object offset above but we do not want to change
		// the actual offsets in the CFI instruction so we have to undo the
		// offset change here.
		if (MovingStackUpdateDown)
		inouehrsUnsubmitted Not Done Reply Inline Actions Since we have only one `MovingStackUpdateDown`, all CSIs are modified back even only a part of them were update above. inouehrs: Since we have only one `MovingStackUpdateDown`, all CSIs are modified back even only a part of…
		nemanjaiUnsubmitted Not Done Reply Inline Actions @stefanp Can you respond to this comment? Is this a problem? Can we get into a situation where we end up with incorrect offsets if not all of the callee-saved spills/restores have been moved/updated? nemanjai: @stefanp Can you respond to this comment? Is this a problem? Can we get into a situation where…
		Offset -= NegFrameSize;

unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(		unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
nullptr, MRI->getDwarfRegNum(Reg, true), Offset));		nullptr, MRI->getDwarfRegNum(Reg, true), Offset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);		.addCFIIndex(CFIIndex);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
// them. In such case, the final update of SP will be to add the frame		// them. In such case, the final update of SP will be to add the frame
// size to it.		// size to it.
// To simplify the code, set RBReg to the base register used to restore		// To simplify the code, set RBReg to the base register used to restore
// values from the stack, and set SPAdd to the value that needs to be added		// values from the stack, and set SPAdd to the value that needs to be added
// to the SP at the end. The default values are as if red zone was present.		// to the SP at the end. The default values are as if red zone was present.
unsigned RBReg = SPReg;		unsigned RBReg = SPReg;
unsigned SPAdd = 0;		unsigned SPAdd = 0;

		// Check if we can move the stack update instruction up the epilogue
		// past the callee saves. This will allow the move to LR instruction
		nemanjaiUnsubmitted Not Done Reply Inline Actions There are a few places where there are extra spaces in comments. But I'm sure that it'll all get fixed up when you run clang-format-diff. nemanjai: There are a few places where there are extra spaces in comments. But I'm sure that it'll all…
		leiUnsubmitted Not Done Reply Inline Actions I don't think you addressed this issue with the extra spaces in comments. From what I can see we don't put an extra space after `//` when continuing a comment from the previous line. lei: I don't think you addressed this issue with the extra spaces in comments. From what I can see…
		// to be executed before the restores of the callee saves which means
		// that the callee saves can hide the latency from the MTLR instrcution.
		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
		bool RequiresScavenging = TRI->requiresFrameIndexScavenging(MF);
		MachineBasicBlock::iterator StackUpdateLoc = MBBI;
		if (FrameSize && !FI->hasFastCall() && !FI->usesPICBase() && !HasFP &&
		nemanjaiUnsubmitted Not Done Reply Inline Actions The condition here looks like a repeat of the one in the prologue inserter. Can we actually extract this into a function that can be queried by both? Then this can be changed to something like: `if (stackUpdateCanBeMoved())` or something equivalently simple. nemanjai: The condition here looks like a repeat of the one in the prologue inserter. Can we actually…
		!HasBP && !isLargeFrame && !RequiresScavenging && Subtarget.isPPC64()) {
		const std::vector<CalleeSavedInfo> & Info = MFI.getCalleeSavedInfo();
		for (CalleeSavedInfo CSI : Info) {
		syzaaraUnsubmitted Done Reply Inline Actions Can use range based for loop here as well. syzaara: Can use range based for loop here as well.
		int FrIdx = CSI.getFrameIdx();
		nemanjaiUnsubmitted Not Done Reply Inline Actions s/stack update pointer/update of the stack pointer nemanjai: s/stack update pointer/update of the stack pointer
		if (FrIdx < 0) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions This probably applies both here and above, but doesn't the condition `FrIdx > 0` actually mean we should not do this at all? I'm not sure if it's possible, but what if you had frame indices that are say { negative, negative, positive, negative } as you iterate? Wouldn't you want to leave the function alone then? Also, what is it about `FrIdx == 0` that prevents you from moving the stack pointer update up past it? For that matter, there should be a comment why we require the condition at all. Finally, please add a comment as to why it is not necessary to manually update the offsets for the CSR spills and restores - we just move the stack ptr update and something else figures out the offsets. nemanjai: This probably applies both here and above, but doesn't the condition `FrIdx > 0` actually mean…
		leiUnsubmitted Not Done Reply Inline Actions merge this if statement with the nested if lei: merge this if statement with the nested if
		nemanjaiUnsubmitted Not Done Reply Inline Actions Same comment as a similar check in the prologue inserter. nemanjai: Same comment as a similar check in the prologue inserter.
		if (MFI.isFixedObjectIndex(FrIdx) && MFI.getObjectOffset(FrIdx) < 0)
		StackUpdateLoc--;
		}
		}
		}

		nemanjaiUnsubmitted Not Done Reply Inline Actions `// Abort the operation as we can't update all CSR restores.` nemanjai: `// Abort the operation as we can't update all CSR restores.`
if (FrameSize) {		if (FrameSize) {
// In the prologue, the loaded (or persistent) stack pointer value is		// In the prologue, the loaded (or persistent) stack pointer value is
// offset by the STDU/STDUX/STWU/STWUX instruction. For targets with red		// offset by the STDU/STDUX/STWU/STWUX instruction. For targets with red
// zone add this offset back now.		// zone add this offset back now.

// If this function contained a fastcc call and GuaranteedTailCallOpt is		// If this function contained a fastcc call and GuaranteedTailCallOpt is
// enabled (=> hasFastCall()==true) the fastcc call might contain a tail		// enabled (=> hasFastCall()==true) the fastcc call might contain a tail
// call which invalidates the stack pointer value in SP(0). So we use the		// call which invalidates the stack pointer value in SP(0). So we use the
Show All 13 Lines	if (FI->hasFastCall()) {
.addImm(FrameSize & 0xFFFF);		.addImm(FrameSize & 0xFFFF);
BuildMI(MBB, MBBI, dl, AddInst)		BuildMI(MBB, MBBI, dl, AddInst)
.addReg(RBReg)		.addReg(RBReg)
.addReg(FPReg)		.addReg(FPReg)
.addReg(ScratchReg);		.addReg(ScratchReg);
}		}
} else if (!isLargeFrame && !HasBP && !MFI.hasVarSizedObjects()) {		} else if (!isLargeFrame && !HasBP && !MFI.hasVarSizedObjects()) {
if (HasRedZone) {		if (HasRedZone) {
BuildMI(MBB, MBBI, dl, AddImmInst, SPReg)		BuildMI(MBB, StackUpdateLoc, dl, AddImmInst, SPReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(FrameSize);		.addImm(FrameSize);
} else {		} else {
// Make sure that adding FrameSize will not overflow the max offset		// Make sure that adding FrameSize will not overflow the max offset
// size.		// size.
assert(FPOffset <= 0 && BPOffset <= 0 && PBPOffset <= 0 &&		assert(FPOffset <= 0 && BPOffset <= 0 && PBPOffset <= 0 &&
"Local offsets should be negative");		"Local offsets should be negative");
SPAdd = FrameSize;		SPAdd = FrameSize;
FPOffset += FrameSize;		FPOffset += FrameSize;
BPOffset += FrameSize;		BPOffset += FrameSize;
PBPOffset += FrameSize;		PBPOffset += FrameSize;
}		}
} else {		} else {
// We don't want to use ScratchReg as a base register, because it		// We don't want to use ScratchReg as a base register, because it
// could happen to be R0. Use FP instead, but make sure to preserve it.		// could happen to be R0. Use FP instead, but make sure to preserve it.
if (!HasRedZone) {		if (!HasRedZone) {
// If FP is not saved, copy it to ScratchReg.		// If FP is not saved, copy it to ScratchReg.
if (!HasFP)		if (!HasFP)
BuildMI(MBB, MBBI, dl, OrInst, ScratchReg)		BuildMI(MBB, MBBI, dl, OrInst, ScratchReg)
.addReg(FPReg)		.addReg(FPReg)
.addReg(FPReg);		.addReg(FPReg);
RBReg = FPReg;		RBReg = FPReg;
}		}
BuildMI(MBB, MBBI, dl, LoadInst, RBReg)		BuildMI(MBB, StackUpdateLoc, dl, LoadInst, RBReg)
.addImm(0)		.addImm(0)
.addReg(SPReg);		.addReg(SPReg);
}		}
}		}
assert(RBReg != ScratchReg && "Should have avoided ScratchReg");		assert(RBReg != ScratchReg && "Should have avoided ScratchReg");
// If there is no red zone, ScratchReg may be needed for holding a useful		// If there is no red zone, ScratchReg may be needed for holding a useful
// value (although not the base register). Make sure it is not overwritten		// value (although not the base register). Make sure it is not overwritten
// too early.		// too early.
Show All 16 Lines	void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
}		}

// Delay restoring of the LR if ScratchReg is needed. This is ok, since		// Delay restoring of the LR if ScratchReg is needed. This is ok, since
// LR is stored in the caller's stack frame. ScratchReg will be needed		// LR is stored in the caller's stack frame. ScratchReg will be needed
// if RBReg is anything other than SP. We shouldn't use ScratchReg as		// if RBReg is anything other than SP. We shouldn't use ScratchReg as
// a base register anyway, because it may happen to be R0.		// a base register anyway, because it may happen to be R0.
bool LoadedLR = false;		bool LoadedLR = false;
if (MustSaveLR && RBReg == SPReg && isInt<16>(LROffset+SPAdd)) {		if (MustSaveLR && RBReg == SPReg && isInt<16>(LROffset+SPAdd)) {
BuildMI(MBB, MBBI, dl, LoadInst, ScratchReg)		BuildMI(MBB, StackUpdateLoc, dl, LoadInst, ScratchReg)
.addImm(LROffset+SPAdd)		.addImm(LROffset+SPAdd)
.addReg(RBReg);		.addReg(RBReg);
LoadedLR = true;		LoadedLR = true;
}		}

if (MustSaveCR && !(SingleScratchReg && MustSaveLR)) {		if (MustSaveCR && !(SingleScratchReg && MustSaveLR)) {
// This will only occur for PPC64.		// This will only occur for PPC64.
assert(isPPC64 && "Expecting 64-bit mode");		assert(isPPC64 && "Expecting 64-bit mode");
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (RBReg != SPReg \|\| SPAdd != 0) {
assert(RBReg != ScratchReg && "Should be using FP or SP as base register");		assert(RBReg != ScratchReg && "Should be using FP or SP as base register");
if (RBReg == FPReg)		if (RBReg == FPReg)
BuildMI(MBB, MBBI, dl, OrInst, FPReg)		BuildMI(MBB, MBBI, dl, OrInst, FPReg)
.addReg(ScratchReg)		.addReg(ScratchReg)
.addReg(ScratchReg);		.addReg(ScratchReg);

// Now load the LR from the caller's stack frame.		// Now load the LR from the caller's stack frame.
if (MustSaveLR && !LoadedLR)		if (MustSaveLR && !LoadedLR)
BuildMI(MBB, MBBI, dl, LoadInst, ScratchReg)		BuildMI(MBB, MBBI, dl, LoadInst, ScratchReg)
		inouehrsUnsubmitted Not Done Reply Inline Actions If the latency of MTLR is the critical path in the epilogue, can we move this load before stack pointer update (with adjusted offset) to hide the latency further? (But this can be a separate patch.) inouehrs: If the latency of MTLR is the critical path in the epilogue, can we move this load before stack…
		stefanpAuthorUnsubmitted Not Done Reply Inline Actions That's a good point. When I did the original performance tests I had moved the the mtlr past the callee restores and not all the way past the stack pointer update. I'm going to update this patch without that change and then I'll put in another patch with that change alone. stefanp: That's a good point. When I did the original performance tests I had moved the the mtlr past…
.addImm(LROffset)		.addImm(LROffset)
.addReg(SPReg);		.addReg(SPReg);
}		}

if (MustSaveCR &&		if (MustSaveCR &&
!(SingleScratchReg && MustSaveLR)) // will only occur for PPC64		!(SingleScratchReg && MustSaveLR)) // will only occur for PPC64
for (unsigned i = 0, e = MustSaveCRs.size(); i != e; ++i)		for (unsigned i = 0, e = MustSaveCRs.size(); i != e; ++i)
BuildMI(MBB, MBBI, dl, TII.get(PPC::MTOCRF8), MustSaveCRs[i])		BuildMI(MBB, MBBI, dl, TII.get(PPC::MTOCRF8), MustSaveCRs[i])
.addReg(TempReg, getKillRegState(i == e-1));		.addReg(TempReg, getKillRegState(i == e-1));

if (MustSaveLR)		if (MustSaveLR)
BuildMI(MBB, MBBI, dl, MTLRInst).addReg(ScratchReg);		BuildMI(MBB, StackUpdateLoc, dl, MTLRInst).addReg(ScratchReg);

// Callee pop calling convention. Pop parameter/linkage area. Used for tail		// Callee pop calling convention. Pop parameter/linkage area. Used for tail
// call optimization		// call optimization
if (IsReturnBlock) {		if (IsReturnBlock) {
unsigned RetOpcode = MBBI->getOpcode();		unsigned RetOpcode = MBBI->getOpcode();
if (MF.getTarget().Options.GuaranteedTailCallOpt &&		if (MF.getTarget().Options.GuaranteedTailCallOpt &&
(RetOpcode == PPC::BLR \|\| RetOpcode == PPC::BLR8) &&		(RetOpcode == PPC::BLR \|\| RetOpcode == PPC::BLR8) &&
MF.getFunction().getCallingConv() == CallingConv::Fast) {		MF.getFunction().getCallingConv() == CallingConv::Fast) {
▲ Show 20 Lines • Show All 656 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCRegisterInfo.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	public:

bool enableMultipleCopyHints() const override { return true; }		bool enableMultipleCopyHints() const override { return true; }

/// We require the register scavenger.		/// We require the register scavenger.
bool requiresRegisterScavenging(const MachineFunction &MF) const override {		bool requiresRegisterScavenging(const MachineFunction &MF) const override {
return true;		return true;
}		}

bool requiresFrameIndexScavenging(const MachineFunction &MF) const override {		bool requiresFrameIndexScavenging(const MachineFunction &MF) const override;
return true;
}

bool trackLivenessAfterRegAlloc(const MachineFunction &MF) const override {		bool trackLivenessAfterRegAlloc(const MachineFunction &MF) const override {
return true;		return true;
}		}

bool requiresVirtualBaseRegisters(const MachineFunction &MF) const override {		bool requiresVirtualBaseRegisters(const MachineFunction &MF) const override {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCRegisterInfo.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	if (!Subtarget.hasAltivec())
for (TargetRegisterClass::iterator I = PPC::VRRCRegClass.begin(),		for (TargetRegisterClass::iterator I = PPC::VRRCRegClass.begin(),
IE = PPC::VRRCRegClass.end(); I != IE; ++I)		IE = PPC::VRRCRegClass.end(); I != IE; ++I)
markSuperRegs(Reserved, *I);		markSuperRegs(Reserved, *I);

assert(checkAllSuperRegsMarked(Reserved));		assert(checkAllSuperRegsMarked(Reserved));
return Reserved;		return Reserved;
}		}

		bool PPCRegisterInfo::requiresFrameIndexScavenging(const MachineFunction &MF) const {
		const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
		const PPCInstrInfo *InstrInfo = Subtarget.getInstrInfo();
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		const std::vector<CalleeSavedInfo> &Info = MFI.getCalleeSavedInfo();

		// If the saved info is invalid we have to default to true for safety.
		nemanjaiUnsubmitted Not Done Reply Inline Actions "saved info" doesn't really mean anything in this context. The CalleeSavedInfo type is a structure containing information about Callee Saved Registers. nemanjai: "saved info" doesn't really mean anything in this context. The CalleeSavedInfo type is a…
		if (!MFI.isCalleeSavedInfoValid())
		return true;

		// The saved info is valid so it can be traversed.
		nemanjaiUnsubmitted Not Done Reply Inline Actions Same as above. nemanjai: Same as above.
		// Checking for registers that need saving that do not have load or store
		// forms where the address offset is an immediate.
		for (unsigned i=0; i<Info.size(); i++) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Formatting. nemanjai: Formatting.
		int FrIdx = Info[i].getFrameIdx();
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think a constant such as `0x7FFF` communicates intent more clearly. Furthermore, a condition such as this may clearly illustrate that we can't have any higher bits set: `if (FrameSize & ~0x7FFF)` nemanjai: I think a constant such as `0x7FFF` communicates intent more clearly. Furthermore, a condition…
		unsigned Reg = Info[i].getReg();

		// The requiresFrameIndexScavenging function is only called when we emit
		nemanjaiUnsubmitted Not Done Reply Inline Actions Indices. nemanjai: Indices.
		// prologue or epilogue code. Therefore we do not need to consider stack
		// objects that are not fixed.
		// Also, checked shrinkwrapping and the prologue / epilogue will not be
		// moved past a frame related operation.
		if (!MFI.isFixedObjectIndex(FrIdx)) continue;

		unsigned Opcode = InstrInfo->getStoreOpcodeForSpill(Reg);
		if (InstrInfo->isXFormMemOp(Opcode))
		return true;
		nemanjaiUnsubmitted Not Done Reply Inline Actions I am not sure I really follow when we need FI scavenging... I was under the impression that we might need it if: The frame size is too large so we need to use an X-Form store/load for the spill/restore The alignment of the spill/restore is lower than what is required by the D-Form (4 for DS-Form, 16 for DQ-Form) The only opcode for the spill/restore is an X-Form This if statement certainly seems to accomplish the last of those but I don't see anything that accomplishes the other two. The first can certainly be an early exit before the loop. nemanjai: I am not sure I really follow when we need FI scavenging... I was under the impression that we…
		}
		return false;
		}

bool PPCRegisterInfo::isCallerPreservedPhysReg(unsigned PhysReg,		bool PPCRegisterInfo::isCallerPreservedPhysReg(unsigned PhysReg,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
assert(TargetRegisterInfo::isPhysicalRegister(PhysReg));		assert(TargetRegisterInfo::isPhysicalRegister(PhysReg));
if (TM.isELFv2ABI() && PhysReg == PPC::X2) {		if (TM.isELFv2ABI() && PhysReg == PPC::X2) {
// X2 is guaranteed to be preserved within a function if it is reserved.		// X2 is guaranteed to be preserved within a function if it is reserved.
// The reason it's reserved is that it's the TOC pointer (and the function		// The reason it's reserved is that it's the TOC pointer (and the function
// uses the TOC). In functions where it isn't reserved (i.e. leaf functions		// uses the TOC). In functions where it isn't reserved (i.e. leaf functions
// with no TOC access), we can't claim that it is preserved.		// with no TOC access), we can't claim that it is preserved.
Show All 11 Lines	unsigned PPCRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
switch (RC->getID()) {		switch (RC->getID()) {
default:		default:
return 0;		return 0;
case PPC::G8RC_NOX0RegClassID:		case PPC::G8RC_NOX0RegClassID:
case PPC::GPRC_NOR0RegClassID:		case PPC::GPRC_NOR0RegClassID:
case PPC::G8RCRegClassID:		case PPC::G8RCRegClassID:
case PPC::GPRCRegClassID: {		case PPC::GPRCRegClassID: {
unsigned FP = TFI->hasFP(MF) ? 1 : 0;		unsigned FP = TFI->hasFP(MF) ? 1 : 0;
return 32 - FP - DefaultSafety;		return 32 - FP - DefaultSafety;
		nemanjaiUnsubmitted Not Done Reply Inline Actions We use spaces around binary/assignment operators. Maybe just clang-format-diff the patch. nemanjai: We use spaces around binary/assignment operators. Maybe just clang-format-diff the patch.
}		}
case PPC::F8RCRegClassID:		case PPC::F8RCRegClassID:
case PPC::F4RCRegClassID:		case PPC::F4RCRegClassID:
		leiUnsubmitted Done Reply Inline Actions maybe an early exit here instead... `if (FrIdx >=0) continue;` lei: maybe an early exit here instead... `if (FrIdx >=0) continue;`
case PPC::QFRCRegClassID:		case PPC::QFRCRegClassID:
case PPC::QSRCRegClassID:		case PPC::QSRCRegClassID:
case PPC::QBRCRegClassID:		case PPC::QBRCRegClassID:
case PPC::VRRCRegClassID:		case PPC::VRRCRegClassID:
		nemanjaiUnsubmitted Not Done Reply Inline Actions I'm not a fan of this solution. It provides yet another place we check for the register class for a physical register without a clear explanation for why we care about the register class. I'd much prefer a unified solution between `StoreRegToStackSlot()`, `isStoreToStackSlot()` and `requiresFrameIndexScavenging()`. What I'm thinking is something along the lines of: static const unsigned OpcodesForSpills[] = { PPC::STD, PPC::STW, ... }; PPCInstrInfo::getOpcodeForSpill(unsigned Reg, const TargetRegisterClass RC = nullptr); That way we'd have a single definitive list of opcodes that are used for spilling registers and wouldn't have to keep this delicate dance of keeping multiple functions in sync. `isStoreToStackSlot()` would just check the array to see if its opcode is in there `StoreRegToStackSlot()` and `requiresFrameIndexScavenging()` would use `getOpcodeForSpill()` with the register class or physical register respectively `getOpcodeForSpill()` would just compute the index into the array based on the register class and target features and return the respective element Of course, it doesn't have to be done that way, but any solution that would unify this would be good. nemanjai:* I'm not a fan of this solution. It provides yet another place we check for the register class…
case PPC::VFRCRegClassID:		case PPC::VFRCRegClassID:
case PPC::VSLRCRegClassID:		case PPC::VSLRCRegClassID:
return 32 - DefaultSafety;		return 32 - DefaultSafety;
case PPC::VSRCRegClassID:		case PPC::VSRCRegClassID:
case PPC::VSFRCRegClassID:		case PPC::VSFRCRegClassID:
case PPC::VSSRCRegClassID:		case PPC::VSSRCRegClassID:
return 64 - DefaultSafety;		return 64 - DefaultSafety;
case PPC::CRRCRegClassID:		case PPC::CRRCRegClassID:
▲ Show 20 Lines • Show All 789 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/MCSE-caller-preserved-reg.ll

	Show All 9 Lines
	%class.CC = type { %struct.SS }			%class.CC = type { %struct.SS }
	%struct.SS = type { void ()* }			%struct.SS = type { void ()* }

	@_ZN2CC2ccE = external thread_local global %"struct.CC::TT", align 8			@_ZN2CC2ccE = external thread_local global %"struct.CC::TT", align 8

	define noalias i8* @_ZN2CC3funEv(%class.CC* %this) {			define noalias i8* @_ZN2CC3funEv(%class.CC* %this) {
	; CHECK-LABEL: _ZN2CC3funEv:			; CHECK-LABEL: _ZN2CC3funEv:
	; CHECK: mflr 0			; CHECK: mflr 0
	; CHECK-NEXT: std 0, 16(1)
	; CHECK-NEXT: stdu 1, -48(1)
	; CHECK-NEXT: .cfi_def_cfa_offset 48			; CHECK-NEXT: .cfi_def_cfa_offset 48
	; CHECK-NEXT: .cfi_offset lr, 16			; CHECK-NEXT: .cfi_offset lr, 16
	; CHECK-NEXT: .cfi_offset r30, -16			; CHECK-NEXT: .cfi_offset r30, -16
				; CHECK-NEXT: std 30, -16(1)
				; CHECK-NEXT: std 0, 16(1)
				; CHECK-NEXT: stdu 1, -48(1)
	; CHECK-NEXT: ld 12, 0(3)			; CHECK-NEXT: ld 12, 0(3)
	; CHECK-NEXT: std 30, 32(1)
	; CHECK-NEXT: mr 30, 3			; CHECK-NEXT: mr 30, 3
	; CHECK-NEXT: std 2, 24(1)			; CHECK-NEXT: std 2, 24(1)
	; CHECK-NEXT: mtctr 12			; CHECK-NEXT: mtctr 12
	; CHECK-NEXT: bctrl			; CHECK-NEXT: bctrl
	; CHECK-NEXT: ld 2, 24(1)			; CHECK-NEXT: ld 2, 24(1)
	; CHECK-NEXT: addis 3, 2, _ZN2CC2ccE@got@tlsgd@ha			; CHECK-NEXT: addis 3, 2, _ZN2CC2ccE@got@tlsgd@ha
	; CHECK-NEXT: addi 3, 3, _ZN2CC2ccE@got@tlsgd@l			; CHECK-NEXT: addi 3, 3, _ZN2CC2ccE@got@tlsgd@l
	; CHECK-NEXT: bl __tls_get_addr(_ZN2CC2ccE@tlsgd)			; CHECK-NEXT: bl __tls_get_addr(_ZN2CC2ccE@tlsgd)
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK-NEXT: ld 4, 0(3)			; CHECK-NEXT: ld 4, 0(3)
	; CHECK-NEXT: cmpldi 4, 0			; CHECK-NEXT: cmpldi 4, 0
	; CHECK-NEXT: beq 0, .LBB0_2			; CHECK-NEXT: beq 0, .LBB0_2
	; CHECK: addi 4, 3, 8			; CHECK: addi 4, 3, 8
	; CHECK-NEXT: mr 3, 30			; CHECK-NEXT: mr 3, 30
	; CHECK-NEXT: bl _ZN2CC3barEPi			; CHECK-NEXT: bl _ZN2CC3barEPi
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK: ld 30, 32(1)			; CHECK: li 3, 0
	; CHECK-NEXT: li 3, 0
	; CHECK-NEXT: addi 1, 1, 48			; CHECK-NEXT: addi 1, 1, 48
	; CHECK-NEXT: ld 0, 16(1)			; CHECK-NEXT: ld 0, 16(1)
	; CHECK-NEXT: mtlr 0			; CHECK-NEXT: mtlr 0
				; CHECK: ld 30, -16(1)
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%foo = getelementptr inbounds %class.CC, %class.CC* %this, i64 0, i32 0, i32 0			%foo = getelementptr inbounds %class.CC, %class.CC* %this, i64 0, i32 0, i32 0
	%0 = load void (), void ()* %foo, align 8			%0 = load void (), void ()* %foo, align 8
	tail call void %0()			tail call void %0()
	%1 = load i64, i64* getelementptr inbounds (%"struct.CC::TT", %"struct.CC::TT"* @_ZN2CC2ccE, i64 0, i32 0)			%1 = load i64, i64* getelementptr inbounds (%"struct.CC::TT", %"struct.CC::TT"* @_ZN2CC2ccE, i64 0, i32 0)
	%tobool = icmp eq i64 %1, 0			%tobool = icmp eq i64 %1, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then
	Show All 10 Lines

test/CodeGen/PowerPC/ppc-shrink-wrapping.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	;			;
	; DISABLE: .[[ELSE_LABEL]]: # %if.else			; DISABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; DISABLE: slwi 3, 4, 1			; DISABLE: slwi 3, 4, 1
	; DISABLE: .[[EPILOG_BB]]: # %if.end			; DISABLE: .[[EPILOG_BB]]: # %if.end
	;			;
	; Epilogue code.			; Epilogue code.
	; CHECK: mtlr {{[0-9]+}}			; CHECK: mtlr {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK: blr
	;			;
	; ENABLE: .[[ELSE_LABEL]]: # %if.else			; ENABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; ENABLE: slwi 3, 4, 1			; ENABLE: slwi 3, 4, 1
	; ENABLE-NEXT: blr			; ENABLE-NEXT: blr
	define i32 @freqSaveAndRestoreOutsideLoop(i32 %cond, i32 %N) {			define i32 @freqSaveAndRestoreOutsideLoop(i32 %cond, i32 %N) {
	entry:			entry:
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-DAG: addi [[IV]], [[IV]], -1			; CHECK-DAG: addi [[IV]], [[IV]], -1
	; CHECK-DAG: add [[SUM]], 3, [[SUM]]			; CHECK-DAG: add [[SUM]], 3, [[SUM]]
	; CHECK-NEXT: cmplwi [[IV]], 0			; CHECK-NEXT: cmplwi [[IV]], 0
	; CHECK-NEXT: bne 0, .[[LOOP]]			; CHECK-NEXT: bne 0, .[[LOOP]]
	;			;
	; Next BB			; Next BB
	; CHECK: %for.end			; CHECK: %for.end
	; CHECK: mtlr {{[0-9]+}}			; CHECK: mtlr {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK: blr
	define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {			define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {
	entry:			entry:
	br label %for.preheader			br label %for.preheader

	for.preheader:			for.preheader:
	tail call void asm "nop", ""()			tail call void asm "nop", ""()
	br label %for.body			br label %for.body

	Show All 21 Lines
	;			;
	; ENABLE: cmplwi 0, 3, 0			; ENABLE: cmplwi 0, 3, 0
	; ENABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; Prologue code.			; Prologue code.
	; Make sure we save the link register			; Make sure we save the link register
	; CHECK: mflr {{[0-9]+}}			; CHECK: mflr {{[0-9]+}}
	;			;
	; DISABLE: cmplwi 0, 3, 0			; DISABLE: std
	; DISABLE-NEXT: std
	; DISABLE-NEXT: std			; DISABLE-NEXT: std
				; DISABLE: cmplwi 0, 3, 0
	; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; Loop preheader			; Loop preheader
	; CHECK-DAG: li [[SUM:[0-9]+]], 0			; CHECK-DAG: li [[SUM:[0-9]+]], 0
	; CHECK-DAG: li [[IV:[0-9]+]], 10			; CHECK-DAG: li [[IV:[0-9]+]], 10
	;			;
	; Loop body			; Loop body
	; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body			; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body
	Show All 12 Lines
	;			;
	; DISABLE: .[[ELSE_LABEL]]: # %if.else			; DISABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; DISABLE: slwi 3, 4, 1			; DISABLE: slwi 3, 4, 1
	;			;
	; DISABLE: .[[EPILOG_BB]]: # %if.end			; DISABLE: .[[EPILOG_BB]]: # %if.end
	; Epilog code			; Epilog code
	; CHECK: mtlr {{[0-9]+}}			; CHECK: mtlr {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK: blr
	;			;
	; ENABLE: .[[ELSE_LABEL]]: # %if.else			; ENABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; ENABLE: slwi 3, 4, 1			; ENABLE: slwi 3, 4, 1
	; ENABLE-NEXT: blr			; ENABLE-NEXT: blr
	define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) {			define i32 @loopInfoSaveOutsideLoop(i32 %cond, i32 %N) {
	entry:			entry:
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	Show All 34 Lines
	;			;
	; ENABLE: cmplwi 0, 3, 0			; ENABLE: cmplwi 0, 3, 0
	; ENABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; Prologue code.			; Prologue code.
	; Make sure we save the link register			; Make sure we save the link register
	; CHECK: mflr {{[0-9]+}}			; CHECK: mflr {{[0-9]+}}
	;			;
	; DISABLE: cmplwi 0, 3, 0			; DISABLE: std
	; DISABLE-NEXT: std
	; DISABLE-NEXT: std			; DISABLE-NEXT: std
				; DISABLE: cmplwi 0, 3, 0
	; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: bl somethingElse			; CHECK: bl somethingElse
	;			;
	; Loop preheader			; Loop preheader
	; CHECK-DAG: li [[SUM:[0-9]+]], 0			; CHECK-DAG: li [[SUM:[0-9]+]], 0
	; CHECK-DAG: li [[IV:[0-9]+]], 10			; CHECK-DAG: li [[IV:[0-9]+]], 10
	;			;
	Show All 12 Lines
	;			;
	; DISABLE: .[[ELSE_LABEL]]: # %if.else			; DISABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; DISABLE: slwi 3, 4, 1			; DISABLE: slwi 3, 4, 1
	; DISABLE: .[[EPILOG_BB]]: # %if.end			; DISABLE: .[[EPILOG_BB]]: # %if.end
	;			;
	; Epilogue code.			; Epilogue code.
	; CHECK: mtlr {{[0-9]+}}			; CHECK: mtlr {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK: blr
	;			;
	; ENABLE: .[[ELSE_LABEL]]: # %if.else			; ENABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; ENABLE: slwi 3, 4, 1			; ENABLE: slwi 3, 4, 1
	; ENABLE-NEXT: blr			; ENABLE-NEXT: blr
	define i32 @loopInfoRestoreOutsideLoop(i32 %cond, i32 %N) #0 {			define i32 @loopInfoRestoreOutsideLoop(i32 %cond, i32 %N) #0 {
	entry:			entry:
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/tls_get_addr_clobbers.ll

	; RUN: llc -verify-machineinstrs -mtriple="powerpc64le-unknown-linux-gnu" -relocation-model=pic < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple="powerpc64le-unknown-linux-gnu" -relocation-model=pic < %s \| FileCheck %s

	@a = thread_local global i32* null, align 8			@a = thread_local global i32* null, align 8

	define void @test_foo(i32* nocapture %x01, i32* nocapture %x02, i32* nocapture %x03, i32* nocapture %x04, i32* nocapture %x05, i32* nocapture %x06, i32* nocapture %x07, i32* nocapture %x08) #0 {			define void @test_foo(i32* nocapture %x01, i32* nocapture %x02, i32* nocapture %x03, i32* nocapture %x04, i32* nocapture %x05, i32* nocapture %x06, i32* nocapture %x07, i32* nocapture %x08) #0 {
	entry:			entry:

	; CHECK-LABEL: test_foo:			; CHECK-LABEL: test_foo:
	; CHECK: stdu 1, {{-?[0-9]+}}(1)			; CHECK-DAG: stdu 1, {{-?[0-9]+}}(1)
	; CHECK-DAG: mr [[BACKUP_3:[0-9]+]], 3			; CHECK-DAG: mr [[BACKUP_3:[0-9]+]], 3
	; CHECK-DAG: mr [[BACKUP_4:[0-9]+]], 4			; CHECK-DAG: mr [[BACKUP_4:[0-9]+]], 4
	; CHECK-DAG: mr [[BACKUP_5:[0-9]+]], 5			; CHECK-DAG: mr [[BACKUP_5:[0-9]+]], 5
	; CHECK-DAG: mr [[BACKUP_6:[0-9]+]], 6			; CHECK-DAG: mr [[BACKUP_6:[0-9]+]], 6
	; CHECK-DAG: mr [[BACKUP_7:[0-9]+]], 7			; CHECK-DAG: mr [[BACKUP_7:[0-9]+]], 7
	; CHECK-DAG: mr [[BACKUP_8:[0-9]+]], 8			; CHECK-DAG: mr [[BACKUP_8:[0-9]+]], 8
	; CHECK-DAG: mr [[BACKUP_9:[0-9]+]], 9			; CHECK-DAG: mr [[BACKUP_9:[0-9]+]], 9
	; CHECK-DAG: mr [[BACKUP_10:[0-9]+]], 10			; CHECK-DAG: mr [[BACKUP_10:[0-9]+]], 10
	; CHECK-DAG: std [[BACKUP_3]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_3]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_4]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_4]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_5]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_5]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_6]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_6]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_7]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_7]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_8]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_8]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_9]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_9]], {{-?[0-9]+}}(1)
	; CHECK-DAG: std [[BACKUP_10]], {{[0-9]+}}(1)			; CHECK-DAG: std [[BACKUP_10]], {{-?[0-9]+}}(1)
	; CHECK: bl __tls_get_addr			; CHECK: bl __tls_get_addr
	; CHECK-DAG: stw 3, 0([[BACKUP_3]])			; CHECK-DAG: stw 3, 0([[BACKUP_3]])
	; CHECK-DAG: stw 3, 0([[BACKUP_4]])			; CHECK-DAG: stw 3, 0([[BACKUP_4]])
	; CHECK-DAG: stw 3, 0([[BACKUP_5]])			; CHECK-DAG: stw 3, 0([[BACKUP_5]])
	; CHECK-DAG: stw 3, 0([[BACKUP_6]])			; CHECK-DAG: stw 3, 0([[BACKUP_6]])
	; CHECK-DAG: stw 3, 0([[BACKUP_7]])			; CHECK-DAG: stw 3, 0([[BACKUP_7]])
	; CHECK-DAG: stw 3, 0([[BACKUP_8]])			; CHECK-DAG: stw 3, 0([[BACKUP_8]])
	; CHECK-DAG: stw 3, 0([[BACKUP_9]])			; CHECK-DAG: stw 3, 0([[BACKUP_9]])
	Show All 21 Lines