This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/SystemZ/
-
Target/
-
SystemZ/
1/3
SystemZCallingConv.td
-
SystemZFrameLowering.h
7
SystemZFrameLowering.cpp
1/3
SystemZISelLowering.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
call-zos-01.ll
-
call-zos-vec.ll
-
zos-prologue-epilog.ll

Differential D114457

[z/OS] Implement prologue and epilogue generation for z/OS target.
ClosedPublic

Authored by Everybody0523 on Nov 23 2021, 10:07 AM.

Download Raw Diff

Details

Reviewers

uweigand
Kai

Commits

rG9a3584499015: [z/OS] Implement prologue and epilogue generation for z/OS target.
rGffad4d777b22: [z/OS] Implement prologue and epilogue generation for z/OS target.

Summary

This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).

Diff Detail

Event Timeline

Everybody0523 created this revision.Nov 23 2021, 10:07 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptNov 23 2021, 10:07 AM

Everybody0523 requested review of this revision.Nov 23 2021, 10:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2021, 10:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Everybody0523 edited the summary of this revision. (Show Details)Nov 23 2021, 10:26 AM

Harbormaster completed remote builds in B135666: Diff 389244.Nov 23 2021, 2:47 PM

Formatting changes

Harbormaster completed remote builds in B135773: Diff 389383.Nov 23 2021, 9:14 PM

Whitespace change

Harbormaster completed remote builds in B135857: Diff 389516.Nov 24 2021, 9:35 AM

Everybody0523 updated this revision to Diff 389525.Nov 24 2021, 9:40 AM

Harbormaster completed remote builds in B135864: Diff 389525.Nov 24 2021, 10:29 AM

Add more testcases

Harbormaster completed remote builds in B136288: Diff 390130.Nov 26 2021, 7:14 PM

uweigand added inline comments.Nov 28 2021, 6:44 AM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
884	As discussed when this function was added initially, this routine should not modify it's CSI argument. If we want to add the SP to the general set of spilled and restored registers, this should be done in determineCalleeSaves instead.
1168	Why do we need to override the default implementation here? That shouldn't be necessary if we've set everything else up correctly.
1180	In particular, this looks quite wrong. It's true with most ABIs that arguments are in the parent's frame, but that should have been handled elsewhere, it shouldn't require special code in this routine.

Everybody0523 added inline comments.Nov 29 2021, 12:10 PM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1180	Should this be done in LowerFormalArguments instead? I'm sorry but I'm not quite sure where else to do it.

Everybody0523 updated this revision to Diff 390491.Nov 29 2021, 3:17 PM

Everybody0523 marked an inline comment as not done.

Harbormaster completed remote builds in B136551: Diff 390491.Nov 29 2021, 4:00 PM

Pulling out the getFrameIndexReference discussion into the main comments.

First of all, let me review the overall stack layout with the z/OS XPLINK calling convention. To my knowledge, the stack looks basically like this (from high addresses to low addresses):

             ========================================================
             Caller's local variables + spill area
             --------------------------------------------------------
             Caller's argument area (passed to callee)
SP1 + 2176   --------------------------------------------------------
             Caller's 128-byte GPR save area
SP1 + 2048   ========================================================
             Callee's local variables + spill area
             --------------------------------------------------------
             (unless leaf) Callee's argument area 
             --------------------------------------------------------
             (unless leaf) Callee's 128-byte GPR save area
SP2 + 2048   ========================================================


SP1:  Stack pointer at function entry to callee (beginning of prologue)
SP2:  Stack pointer during callee (end of prologue)

SP1 - SP2 == Callee's frame size

Is this correct? If not, please correct. (In any case, it would be good to have a diagram along those lines as comments in this file as well.)

The first question is, what is the base to be used for frame references? There is some flexibility, but it seems most useful to use the incoming SP, modulo the stack pointer bias, for that (i.e. the value "SP1 + 2048" in the above diagram). This would mean that you access the incoming parameters using positive offsets relative to that base, and local variables using negative offsets relative to that base.

The getFrameIndexReference routine needs to compute that base value (and then add the offset) starting from current register values, i.e. SP2 in the above diagram. The default computation (done by the default implementation of the routine) does:

return StackOffset::getFixed(MFI.getObjectOffset(FI) + MFI.getStackSize() -
                             getOffsetOfLocalArea() +
                             MFI.getOffsetAdjustment());

Your proposed platform-specific implementation does instead:

return StackOffset::getFixed(MFI.getObjectOffset(FI) + getAllocatedStackSize(MF) +
                             getOffsetOfLocalArea() +
                             MFI.getOffsetAdjustment());

There's two differences:
1.) you use getAllocatedStackSize instead of MFI.getStackSize
2.) you add instead of subtract the getOffsetOfLocalArea value

Both of these look suspicious to me.

As to the first difference, the difference between getAllocatedStackSize and MFI.getStackSize is that the former adds the 128 bytes that are used for the (callee's) GPR save area, which wasn't part of the stack frame size before. I think it would be preferable to actually add that size to the stack size, so it will be included in MFI.getStackSize to begin with. This would also help other users of that routine.

To do so, you should simply call MFFrame.setStackSize in the emitPrologue routine, similar to what is already done in the ELF ABI case.

As to the second difference, I do not really understand how the local area offset is currently being used by the XPLINK ABI. The constructor currently sets this up as:

SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering()
    : SystemZFrameLowering(TargetFrameLowering::StackGrowsUp, Align(32), 128,
                           Align(32), /* StackRealignable */ false),

So we have an upwards growing stack with a local area offset of 128 bytes. This seems wrong. First of all, the stack is actually downwards growing - note how emitPrologue subtracts from the stack pointer (for an upwards growing stack it would add to the stack pointer).

So one change might be to use StackGrowsDown instead, and then use -128 instead of 128 for the local area offset. (With StackGrowsDown, the offset must be negative.) That would already fix the sign problem your code works around.

However, I think even that is really incorrect. I don't believe in the above diagram we have anything that would match the "local area offset" as used by common code here -- that offset would refer to some fixed members placed between the base of the frame (i.e. SP1 + 2048 as discussed above) and the local variable/spill area -- but there are no fixed members in that place. So I think this really should be 0 anyway.

Maybe the intent was to use the top of the caller's GPR save area as the frame base instead? Then the 128 byte would account for that incoming space. But I'm not sure this has any particular benefit. (@Kai, any comments on that?)

Note that the parameter set up code in LowerFormalParameters et. al. currently also assumes that the frame offsets for incoming parameters are relative to the top of the caller's GPR save area, and not the incoming SP. So depending on the decision what point to use as frame base, we may need to update how those offsets are being computed as well.

Reading the long comment, I guess that there is a systematic error in the calculation of the stack offsets.

First, the diagram looks correct to me. And obviously, the stack grows down.

The local area offset is used to "jump" over the 128 byte GPR save area. According to your description, the semantic of the local area is different, which indicates to me that this field should not be used for this purpose. I guess the different sign of the LocalAreaOffset is also the cause for the strange calculation in getAllocatedStackSize().

For the frame reference, the idea seems to be to reach out into the caller's stack frame, as the formula ends up being: `stack bias + stack size of function + jump over GPR save area + object offset". While the offset of incoming arguments is relative to the GPR save area, the fixed spill offsets are relative to stack pointer =

Well, thinking about all this I guess the best way forward is:

not using the LocalAreaOffset, because it has a different semantic
get rid of getAllocatedStackSize()
use "incoming SP + stackbias" as base for all stack offset calculations, because that is more obvious

In D114457#3174537, @Kai wrote:

Well, thinking about all this I guess the best way forward is:

not using the LocalAreaOffset, because it has a different semantic

get rid of getAllocatedStackSize()

use "incoming SP + stackbias" as base for all stack offset calculations, because that is more obvious

Agreed. For the last part, the only thing that needs to change is the fixed stack offsets for the parameter slots, which now need to include the 128 byte area. This could be done locally in LowerFormalParameters when doing the CreateFixedObject call. But I suspect it might be more straightforward to change the offset computation in SystemZCallingConvention.td to have the correct offset in VA.getLocMemOffset() to begin with.

Everybody0523 updated this revision to Diff 392454.Dec 7 2021, 10:16 AM

uweigand added inline comments.Dec 7 2021, 10:48 AM

llvm/lib/Target/SystemZ/SystemZCallingConv.td
169	I'm wondering if it wouldn't be more straightforward to handle this in determineCalleeSaves, in the place where the frame pointer register is handled?
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1055	This is no longer called anywhere and should be deleted.
1164	This function is now actually identical to the default implementation and should also be deleted.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504	It's a bit ugly, but I'd be OK with it for now. It would be good to add a FIXME saying that ideally the call frame size should have already been included in the offset.

Harbormaster completed remote builds in B137944: Diff 392454.Dec 7 2021, 11:04 AM

Everybody0523 updated this revision to Diff 392524.Dec 7 2021, 1:45 PM

Harbormaster completed remote builds in B137990: Diff 392524.Dec 7 2021, 2:22 PM

Everybody0523 added inline comments.Dec 7 2021, 8:43 PM

llvm/lib/Target/SystemZ/SystemZCallingConv.td
169	That's what this line is for. Adding a register to SavedRegs in determineCalleeSaves doesn't do anything unless that register is marked as a callee-saved register here.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504	So originally I actually included it in the offset, but I found that it would force similar ugliness in the LowerCall routine here. As I understand it, if the call frame size is included in the offset just for XPLINK, then in the location I linked above we would have to only add the call frame size for ELF targets, since the call frame size is already included. ELF callees instead rely on `SystemZELFFrameLowering::getFrameIndexReference` to jump over the call frame size. So I think if we were to add each target's call frame size to the offset, we could actually eliminate the override of getFrameIndexReference for ELF as well. Please correct me if I'm mistaken about the ELF code though. But with that said, I'd prefer not to touch the ELF code in this patch so if you're OK with it for now I'll just leave it as a FIXME

Everybody0523 updated this revision to Diff 392635.Dec 7 2021, 8:46 PM

Harbormaster completed remote builds in B138070: Diff 392635.Dec 7 2021, 9:38 PM

LGTM.

llvm/lib/Target/SystemZ/SystemZCallingConv.td
169	Ah, I see. This is OK then.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504	Indeed, this is another place where the current abstractions break down. I think the proper fix will be to change that line to simply: unsigned Offset = Regs->getStackPointerBias() + VA.getLocMemOffset(); and then set StackPointerBias to 160 for ELF. Those 160 bytes are really a stack bias on ELF - they're part of the callee's stack frame, but always allocated by the caller. But all this should be done in a follow-up patch. For now, this patch LGTM as is.

This revision is now accepted and ready to land.Dec 8 2021, 3:00 AM

Everybody0523 updated this revision to Diff 392794.Dec 8 2021, 8:29 AM

Harbormaster completed remote builds in B138179: Diff 392794.Dec 8 2021, 9:25 AM

uweigand mentioned this in D115269: [SystemZ][z/OS] Add entry point marker to PPA.Dec 9 2021, 9:35 AM

Closed by commit rGffad4d777b22: [z/OS] Implement prologue and epilogue generation for z/OS target. (authored by Everybody0523, committed by Kai). · Explain WhyDec 13 2021, 2:03 PM

This revision was automatically updated to reflect the committed changes.

Kai added a commit: rGffad4d777b22: [z/OS] Implement prologue and epilogue generation for z/OS target..

Hey folks, it looks like this commit has caused the buildbot here to run into failures. Can someone please revert this change, fix it and then commit it again?

muiez added a reverting change: rGebf5497b269f: Revert "[z/OS] Implement prologue and epilogue generation for z/OS target.".Dec 14 2021, 11:23 AM

In D114457#3192772, @shraiysh wrote:

Hey folks, it looks like this commit has caused the buildbot here to run into failures. Can someone please revert this change, fix it and then commit it again?

Yup, sure. Reverted here: https://github.com/llvm/llvm-project/commit/ebf5497b269f5769b53320dd81290714642e4306

This generates STG instruction used to save the frame pointer correctly. It was previously generated as follows:

BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG), SystemZ::R0D)
    .addReg(SystemZ::R4D)
    .addImm(Offset)
    .addReg(0);

This was incorrect as the above invocation of BuildMI would cause the above instruction to give R0D the RegState::Define flag (see here), which does not make sense for a store instruction. The correct way is to generate the instruction as follows:

BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))
    .addReg(SystemZ::R0D)
    .addReg(SystemZ::R4D)
    .addImm(Offset)
    .addReg(0);

Unfortunately it seems that this did not trigger an error unless EXPENSIVE_CHECKS was used, hence the buildbot failure.

Harbormaster completed remote builds in B139295: Diff 394366.Dec 14 2021, 2:28 PM

uweigand added inline comments.Dec 15 2021, 12:08 AM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1115	I think you also need to add `RegState::Kill` for `R0D` here.

Add RegState::Kill to MIBuilder when generating STG to store R0D into stack slot.

Everybody0523 updated this revision to Diff 394596.Dec 15 2021, 9:57 AM

LGTM, please commit again.

Harbormaster completed remote builds in B139463: Diff 394596.Dec 15 2021, 10:51 AM

This revision was landed with ongoing or failed builds.Dec 16 2021, 6:04 AM

fanbo-meng added a commit: rG9a3584499015: [z/OS] Implement prologue and epilogue generation for z/OS target..

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZCallingConv.td

1 line

SystemZFrameLowering.h

9 lines

SystemZFrameLowering.cpp

179 lines

SystemZISelLowering.cpp

12 lines

test/

CodeGen/

SystemZ/

call-zos-01.ll

2 lines

call-zos-vec.ll

2 lines

zos-prologue-epilog.ll

81 lines

Diff 392635

llvm/lib/Target/SystemZ/SystemZCallingConv.td

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// z/OS XPLINK64 callee-saved registers			// z/OS XPLINK64 callee-saved registers
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// %R7D is volatile by the spec, but it must be saved in the prologue by			// %R7D is volatile by the spec, but it must be saved in the prologue by
	// any non-leaf function and restored in the epilogue for use by the			// any non-leaf function and restored in the epilogue for use by the
	// return instruction so it functions exactly like a callee-saved register.			// return instruction so it functions exactly like a callee-saved register.
	def CSR_SystemZ_XPLINK64 : CalleeSavedRegs<(add (sequence "R%dD", 7, 15),			def CSR_SystemZ_XPLINK64 : CalleeSavedRegs<(add (sequence "R%dD", 7, 15),
				(sequence "R%dD", 4, 4),
				uweigandUnsubmitted Not Done Reply Inline Actions I'm wondering if it wouldn't be more straightforward to handle this in determineCalleeSaves, in the place where the frame pointer register is handled? uweigand: I'm wondering if it wouldn't be more straightforward to handle this in determineCalleeSaves, in…
				Everybody0523AuthorUnsubmitted Done Reply Inline Actions That's what this line is for. Adding a register to SavedRegs in determineCalleeSaves doesn't do anything unless that register is marked as a callee-saved register here. Everybody0523: That's what this line is for. Adding a register to SavedRegs in determineCalleeSaves doesn't do…
				uweigandUnsubmitted Not Done Reply Inline Actions Ah, I see. This is OK then. uweigand: Ah, I see. This is OK then.
	(sequence "F%dD", 15, 8))>;			(sequence "F%dD", 15, 8))>;

	def CSR_SystemZ_XPLINK64_Vector : CalleeSavedRegs<(add CSR_SystemZ_XPLINK64,			def CSR_SystemZ_XPLINK64_Vector : CalleeSavedRegs<(add CSR_SystemZ_XPLINK64,
	(sequence "V%d", 23, 16))>;			(sequence "V%d", 23, 16))>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// z/OS XPLINK64 return value calling convention			// z/OS XPLINK64 return value calling convention
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZFrameLowering.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,		bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
ArrayRef<CalleeSavedInfo> CSI,		ArrayRef<CalleeSavedInfo> CSI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

		bool
		restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBII,
		MutableArrayRef<CalleeSavedInfo> CSI,
		const TargetRegisterInfo *TRI) const override;

void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

bool hasFP(const MachineFunction &MF) const override;		bool hasFP(const MachineFunction &MF) const override;

		void processFunctionBeforeFrameFinalized(MachineFunction &MF,
		RegScavenger *RS) const override;
};		};
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

Show First 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	bool SystemZELFFrameLowering::usePackedStack(MachineFunction &MF) const {
bool SoftFloat = MF.getSubtarget<SystemZSubtarget>().hasSoftFloat();		bool SoftFloat = MF.getSubtarget<SystemZSubtarget>().hasSoftFloat();
if (HasPackedStackAttr && BackChain && !SoftFloat)		if (HasPackedStackAttr && BackChain && !SoftFloat)
report_fatal_error("packed-stack + backchain + hard-float is unsupported.");		report_fatal_error("packed-stack + backchain + hard-float is unsupported.");
bool CallConv = MF.getFunction().getCallingConv() != CallingConv::GHC;		bool CallConv = MF.getFunction().getCallingConv() != CallingConv::GHC;
return HasPackedStackAttr && CallConv;		return HasPackedStackAttr && CallConv;
}		}

SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering()		SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering()
: SystemZFrameLowering(TargetFrameLowering::StackGrowsUp, Align(32), 128,		: SystemZFrameLowering(TargetFrameLowering::StackGrowsDown, Align(32), 0,
Align(32), /* StackRealignable */ false),		Align(32), /* StackRealignable */ false),
RegSpillOffsets(-1) {		RegSpillOffsets(-1) {

// Create a mapping from register number to save slot offset.		// Create a mapping from register number to save slot offset.
// These offsets are relative to the start of the local are area.		// These offsets are relative to the start of the local are area.
RegSpillOffsets.grow(SystemZ::NUM_TARGET_REGS);		RegSpillOffsets.grow(SystemZ::NUM_TARGET_REGS);
for (unsigned I = 0, E = array_lengthof(XPLINKSpillOffsetTable); I != E; ++I)		for (unsigned I = 0, E = array_lengthof(XPLINKSpillOffsetTable); I != E; ++I)
RegSpillOffsets[XPLINKSpillOffsetTable[I].Reg] =		RegSpillOffsets[XPLINKSpillOffsetTable[I].Reg] =
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bool SystemZXPLINKFrameLowering::assignCalleeSavedSpillSlots(
// For non-leaf functions:		// For non-leaf functions:
// - the address of callee (entry point) register R6 must be saved		// - the address of callee (entry point) register R6 must be saved
Spills.push_back(CalleeSavedInfo(Regs.getAddressOfCalleeRegister()));		Spills.push_back(CalleeSavedInfo(Regs.getAddressOfCalleeRegister()));

// If the function needs a frame pointer, or if the backchain pointer should		// If the function needs a frame pointer, or if the backchain pointer should
// be stored, then save the stack pointer register R4.		// be stored, then save the stack pointer register R4.
if (hasFP(MF) \|\| MF.getFunction().hasFnAttribute("backchain"))		if (hasFP(MF) \|\| MF.getFunction().hasFnAttribute("backchain"))
Spills.push_back(CalleeSavedInfo(RegSP));		Spills.push_back(CalleeSavedInfo(RegSP));

		uweigandUnsubmitted Not Done Reply Inline Actions As discussed when this function was added initially, this routine should not modify it's CSI argument. If we want to add the SP to the general set of spilled and restored registers, this should be done in determineCalleeSaves instead. uweigand: As discussed when this function was added initially, this routine should not modify it's CSI…
// Save the range of call-saved registers, for use by the		// Save the range of call-saved registers, for use by the
// prologue/epilogue inserters.		// prologue/epilogue inserters.
ProcessCSI(CSI);		ProcessCSI(CSI);
MFI->setRestoreGPRRegs(LowGPR, HighGPR, LowOffset);		MFI->setRestoreGPRRegs(LowGPR, HighGPR, LowOffset);

// Save the range of call-saved registers, for use by the epilogue inserter.		// Save the range of call-saved registers, for use by the epilogue inserter.
ProcessCSI(Spills);		ProcessCSI(Spills);
MFI->setSpillGPRRegs(LowGPR, HighGPR, LowOffset);		MFI->setSpillGPRRegs(LowGPR, HighGPR, LowOffset);
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	if (SystemZ::VR128BitRegClass.contains(Reg)) {
TII->storeRegToStackSlot(MBB, MBBI, Reg, true, CSI[I].getFrameIdx(),		TII->storeRegToStackSlot(MBB, MBBI, Reg, true, CSI[I].getFrameIdx(),
&SystemZ::VR128BitRegClass, TRI);		&SystemZ::VR128BitRegClass, TRI);
}		}
}		}

return true;		return true;
}		}

		bool SystemZXPLINKFrameLowering::restoreCalleeSavedRegisters(
		MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
		MutableArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {

		if (CSI.empty())
		return false;

		MachineFunction &MF = *MBB.getParent();
		SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();
		const SystemZSubtarget &Subtarget = MF.getSubtarget<SystemZSubtarget>();
		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		auto &Regs = Subtarget.getSpecialRegisters<SystemZXPLINK64Registers>();

		DebugLoc DL = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();

		// Restore FPRs in the normal TargetInstrInfo way.
		for (unsigned I = 0, E = CSI.size(); I != E; ++I) {
		unsigned Reg = CSI[I].getReg();
		if (SystemZ::FP64BitRegClass.contains(Reg))
		TII->loadRegFromStackSlot(MBB, MBBI, Reg, CSI[I].getFrameIdx(),
		&SystemZ::FP64BitRegClass, TRI);
		if (SystemZ::VR128BitRegClass.contains(Reg))
		TII->loadRegFromStackSlot(MBB, MBBI, Reg, CSI[I].getFrameIdx(),
		&SystemZ::VR128BitRegClass, TRI);
		}

		// Restore call-saved GPRs (but not call-clobbered varargs, which at
		// this point might hold return values).
		SystemZ::GPRRegs RestoreGPRs = ZFI->getRestoreGPRRegs();
		if (RestoreGPRs.LowGPR) {
		assert(isInt<20>(Regs.getStackPointerBias() + RestoreGPRs.GPROffset));
		if (RestoreGPRs.LowGPR == RestoreGPRs.HighGPR)
		// Build an LG/L instruction.
		BuildMI(MBB, MBBI, DL, TII->get(SystemZ::LG), RestoreGPRs.LowGPR)
		.addReg(Regs.getStackPointerRegister())
		.addImm(Regs.getStackPointerBias() + RestoreGPRs.GPROffset)
		.addReg(0);
		else {
		// Build an LMG/LM instruction.
		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(SystemZ::LMG));

		// Add the explicit register operands.
		MIB.addReg(RestoreGPRs.LowGPR, RegState::Define);
		MIB.addReg(RestoreGPRs.HighGPR, RegState::Define);

		// Add the address.
		MIB.addReg(Regs.getStackPointerRegister());
		MIB.addImm(Regs.getStackPointerBias() + RestoreGPRs.GPROffset);

		// Do a second scan adding regs as being defined by instruction
		for (unsigned I = 0, E = CSI.size(); I != E; ++I) {
		unsigned Reg = CSI[I].getReg();
		if (Reg > RestoreGPRs.LowGPR && Reg < RestoreGPRs.HighGPR)
		MIB.addReg(Reg, RegState::ImplicitDefine);
		}
		}
		}

		return true;
		}

void SystemZXPLINKFrameLowering::emitPrologue(MachineFunction &MF,		void SystemZXPLINKFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {}		MachineBasicBlock &MBB) const {
		uweigandUnsubmitted Not Done Reply Inline Actions This is no longer called anywhere and should be deleted. uweigand: This is no longer called anywhere and should be deleted.
		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");
		const SystemZSubtarget &Subtarget = MF.getSubtarget<SystemZSubtarget>();
		SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();
		MachineBasicBlock::iterator MBBI = MBB.begin();
		auto ZII = static_cast<const SystemZInstrInfo >(Subtarget.getInstrInfo());
		auto &Regs = Subtarget.getSpecialRegisters<SystemZXPLINK64Registers>();
		MachineFrameInfo &MFFrame = MF.getFrameInfo();
		MachineInstr *StoreInstr = nullptr;
		bool HasFP = hasFP(MF);
		// Debug location must be unknown since the first debug location is used
		// to determine the end of the prologue.
		DebugLoc DL;
		uint64_t Offset = 0;

		// TODO: Support leaf functions; only add size of save+reserved area when
		// function is non-leaf.
		MFFrame.setStackSize(MFFrame.getStackSize() + Regs.getCallFrameSize());
		uint64_t StackSize = MFFrame.getStackSize();

		// FIXME: Implement support for large stack sizes, when the stack extension
		// routine needs to be called.
		if (StackSize > 1024 * 1024) {
		llvm_unreachable("Huge Stack Frame not yet supported on z/OS");
		}

		if (ZFI->getSpillGPRRegs().LowGPR) {
		// Skip over the GPR saves.
		if ((MBBI != MBB.end()) && ((MBBI->getOpcode() == SystemZ::STMG))) {
		const int Operand = 3;
		// Now we can set the offset for the operation, since now the Stack
		// has been finalized.
		Offset = Regs.getStackPointerBias() + MBBI->getOperand(Operand).getImm();
		// Maximum displacement for STMG instruction.
		if (isInt<20>(Offset - StackSize))
		Offset -= StackSize;
		else
		StoreInstr = &*MBBI;
		MBBI->getOperand(Operand).setImm(Offset);
		++MBBI;
		} else
		llvm_unreachable("Couldn't skip over GPR saves");
		}

		if (StackSize) {
		MachineBasicBlock::iterator InsertPt = StoreInstr ? StoreInstr : MBBI;
		// Allocate StackSize bytes.
		int64_t Delta = -int64_t(StackSize);

		// In case the STM(G) instruction also stores SP (R4), but the displacement
		// is too large, the SP register is manipulated first before storing,
		// resulting in the wrong value stored and retrieved later. In this case, we
		// need to temporarily save the value of SP, and store it later to memory.
		if (StoreInstr && HasFP) {
		// Insert LR r0,r4 before STMG instruction.
		BuildMI(MBB, InsertPt, DL, ZII->get(SystemZ::LGR))
		.addReg(SystemZ::R0D, RegState::Define)
		.addReg(SystemZ::R4D);
		// Insert ST r0,xxx(,r4) after STMG instruction.
		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG), SystemZ::R0D)
		.addReg(SystemZ::R4D)
		uweigandUnsubmitted Not Done Reply Inline Actions I think you also need to add `RegState::Kill` for `R0D` here. uweigand: I think you also need to add `RegState::Kill` for `R0D` here.
		.addImm(Offset)
		.addReg(0);
		}

		emitIncrement(MBB, InsertPt, DL, Regs.getStackPointerRegister(), Delta,
		ZII);
		}

		if (HasFP) {
		// Copy the base of the frame to Frame Pointer Register.
		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR),
		Regs.getFramePointerRegister())
		.addReg(Regs.getStackPointerRegister());

		// Mark the FramePtr as live at the beginning of every block except
		// the entry block. (We'll have marked R8 as live on entry when
		// saving the GPRs.)
		for (auto I = std::next(MF.begin()), E = MF.end(); I != E; ++I)
		I->addLiveIn(Regs.getFramePointerRegister());
		}
		}

void SystemZXPLINKFrameLowering::emitEpilogue(MachineFunction &MF,		void SystemZXPLINKFrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {}		MachineBasicBlock &MBB) const {
		const SystemZSubtarget &Subtarget = MF.getSubtarget<SystemZSubtarget>();
		MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
		SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();
		MachineFrameInfo &MFFrame = MF.getFrameInfo();
		auto ZII = static_cast<const SystemZInstrInfo >(Subtarget.getInstrInfo());
		auto &Regs = Subtarget.getSpecialRegisters<SystemZXPLINK64Registers>();

		// Skip the return instruction.
		assert(MBBI->isReturn() && "Can only insert epilogue into returning blocks");

		uint64_t StackSize = MFFrame.getStackSize();
		if (StackSize) {
		unsigned SPReg = Regs.getStackPointerRegister();
		if (ZFI->getRestoreGPRRegs().LowGPR != SPReg) {
		DebugLoc DL = MBBI->getDebugLoc();
		emitIncrement(MBB, MBBI, DL, SPReg, StackSize, ZII);
		}
		}
		}

bool SystemZXPLINKFrameLowering::hasFP(const MachineFunction &MF) const {		bool SystemZXPLINKFrameLowering::hasFP(const MachineFunction &MF) const {
return false;		return (MF.getFrameInfo().hasVarSizedObjects());
		}

		void SystemZXPLINKFrameLowering::processFunctionBeforeFrameFinalized(
		uweigandUnsubmitted Not Done Reply Inline Actions This function is now actually identical to the default implementation and should also be deleted. uweigand: This function is now actually identical to the default implementation and should also be…
		MachineFunction &MF, RegScavenger *RS) const {
		MachineFrameInfo &MFFrame = MF.getFrameInfo();
		const SystemZSubtarget &Subtarget = MF.getSubtarget<SystemZSubtarget>();
		auto &Regs = Subtarget.getSpecialRegisters<SystemZXPLINK64Registers>();
		uweigandUnsubmitted Not Done Reply Inline Actions Why do we need to override the default implementation here? That shouldn't be necessary if we've set everything else up correctly. uweigand: Why do we need to override the default implementation here? That shouldn't be necessary if…

		// Setup stack frame offset
		MFFrame.setOffsetAdjustment(Regs.getStackPointerBias());
}		}
		uweigandUnsubmitted Not Done Reply Inline Actions In particular, this looks quite wrong. It's true with most ABIs that arguments are in the parent's frame, but that should have been handled elsewhere, it shouldn't require special code in this routine. uweigand: In particular, this looks quite wrong. It's true with most ABIs that arguments are in the…
		Everybody0523AuthorUnsubmitted Not Done Reply Inline Actions Should this be done in LowerFormalArguments instead? I'm sorry but I'm not quite sure where else to do it. Everybody0523: Should this be done in LowerFormalArguments instead? I'm sorry but I'm not quite sure where…

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,494 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {

Register VReg = MRI.createVirtualRegister(RC);		Register VReg = MRI.createVirtualRegister(RC);
MRI.addLiveIn(VA.getLocReg(), VReg);		MRI.addLiveIn(VA.getLocReg(), VReg);
ArgValue = DAG.getCopyFromReg(Chain, DL, VReg, LocVT);		ArgValue = DAG.getCopyFromReg(Chain, DL, VReg, LocVT);
} else {		} else {
assert(VA.isMemLoc() && "Argument not register or memory");		assert(VA.isMemLoc() && "Argument not register or memory");

// Create the frame index object for this incoming parameter.		// Create the frame index object for this incoming parameter.
int FI = MFI.CreateFixedObject(LocVT.getSizeInBits() / 8,		// FIXME: Pre-include call frame size in the offset, should not
VA.getLocMemOffset(), true);		// need to manually add it here.
		uweigandUnsubmitted Not Done Reply Inline Actions It's a bit ugly, but I'd be OK with it for now. It would be good to add a FIXME saying that ideally the call frame size should have already been included in the offset. uweigand: It's a bit ugly, but I'd be OK with it for now. It would be good to add a FIXME saying that…
		Everybody0523AuthorUnsubmitted Done Reply Inline Actions So originally I actually included it in the offset, but I found that it would force similar ugliness in the LowerCall routine here. As I understand it, if the call frame size is included in the offset just for XPLINK, then in the location I linked above we would have to only add the call frame size for ELF targets, since the call frame size is already included. ELF callees instead rely on `SystemZELFFrameLowering::getFrameIndexReference` to jump over the call frame size. So I think if we were to add each target's call frame size to the offset, we could actually eliminate the override of getFrameIndexReference for ELF as well. Please correct me if I'm mistaken about the ELF code though. But with that said, I'd prefer not to touch the ELF code in this patch so if you're OK with it for now I'll just leave it as a FIXME Everybody0523: So originally I actually included it in the offset, but I found that it would force similar…
		uweigandUnsubmitted Not Done Reply Inline Actions Indeed, this is another place where the current abstractions break down. I think the proper fix will be to change that line to simply: unsigned Offset = Regs->getStackPointerBias() + VA.getLocMemOffset(); and then set StackPointerBias to 160 for ELF. Those 160 bytes are really a stack bias on ELF - they're part of the callee's stack frame, but always allocated by the caller. But all this should be done in a follow-up patch. For now, this patch LGTM as is. uweigand: Indeed, this is another place where the current abstractions break down. I think the proper…
		int64_t ArgSPOffset = VA.getLocMemOffset();
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - int64_t ArgSPOffset = VA.getLocMemOffset(); + int64_t ArgSPOffset = VA.getLocMemOffset(); Lint: Pre-merge checks: clang-format: please reformat the code ``` - int64_t ArgSPOffset = VA.getLocMemOffset()…
		if (Subtarget.isTargetXPLINK64()) {
		auto &XPRegs =
		Subtarget.getSpecialRegisters<SystemZXPLINK64Registers>();
		ArgSPOffset += XPRegs.getCallFrameSize();
		}
		int FI =
		MFI.CreateFixedObject(LocVT.getSizeInBits() / 8, ArgSPOffset, true);

// Create the SelectionDAG nodes corresponding to a load		// Create the SelectionDAG nodes corresponding to a load
// from this parameter. Unpromoted ints and floats are		// from this parameter. Unpromoted ints and floats are
// passed as right-justified 8-byte values.		// passed as right-justified 8-byte values.
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);		SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
if (VA.getLocVT() == MVT::i32 \|\| VA.getLocVT() == MVT::f32)		if (VA.getLocVT() == MVT::i32 \|\| VA.getLocVT() == MVT::f32)
FIN = DAG.getNode(ISD::ADD, DL, PtrVT, FIN,		FIN = DAG.getNode(ISD::ADD, DL, PtrVT, FIN,
DAG.getIntPtrConstant(4, DL));		DAG.getIntPtrConstant(4, DL));
▲ Show 20 Lines • Show All 7,162 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/call-zos-01.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	define signext i64 @pass_long(i64 signext %arg0, i64 signext %arg1, i64 signext %arg2) {			define signext i64 @pass_long(i64 signext %arg0, i64 signext %arg1, i64 signext %arg2) {
	entry:			entry:
	%N = add i64 %arg0, %arg1			%N = add i64 %arg0, %arg1
	%M = add i64 %N, %arg2			%M = add i64 %N, %arg2
	ret i64 %M			ret i64 %M
	}			}

	; CHECK-LABEL: pass_integrals0:			; CHECK-LABEL: pass_integrals0:
	; CHECK: ag 2, -{{[0-9]+}}(4)			; CHECK: ag 2, 2328(4)
	; CHECK-NEXT: lgr 3, 2			; CHECK-NEXT: lgr 3, 2
	define signext i64 @pass_integrals0(i64 signext %arg0, i32 signext %arg1, i16 signext %arg2, i64 signext %arg3) {			define signext i64 @pass_integrals0(i64 signext %arg0, i32 signext %arg1, i16 signext %arg2, i64 signext %arg3) {
	entry:			entry:
	%N = sext i32 %arg1 to i64			%N = sext i32 %arg1 to i64
	%M = add i64 %arg3, %N			%M = add i64 %arg3, %N
	ret i64 %M			ret i64 %M
	}			}

	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/call-zos-vec.ll

	; RUN: llc < %s -mtriple=s390x-ibm-zos -mcpu=z13 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-ibm-zos -mcpu=z13 \| FileCheck %s

	; CHECK-LABEL: sum_vecs0			; CHECK-LABEL: sum_vecs0
	; CHECK: vag 24, 24, 25			; CHECK: vag 24, 24, 25
	define <2 x i64> @sum_vecs0(<2 x i64> %v1, <2 x i64> %v2) {			define <2 x i64> @sum_vecs0(<2 x i64> %v1, <2 x i64> %v2) {
	entry:			entry:
	%add0 = add <2 x i64> %v1, %v2			%add0 = add <2 x i64> %v1, %v2
	ret <2 x i64> %add0			ret <2 x i64> %add0
	}			}

	; CHECK-LABEL: sum_vecs1			; CHECK-LABEL: sum_vecs1
	; CHECK: vaf 1, 24, 25			; CHECK: vaf 1, 24, 25
	; CHECK: vaf 1, 1, 26			; CHECK: vaf 1, 1, 26
	; CHECK: vaf 1, 1, 27			; CHECK: vaf 1, 1, 27
	; CHECK: vaf 1, 1, 28			; CHECK: vaf 1, 1, 28
	; CHECK: vaf 1, 1, 29			; CHECK: vaf 1, 1, 29
	; CHECK: vl 0, 32(4), 4			; CHECK: vl 0, 2432(4), 4
	; CHECK: vaf 1, 1, 30			; CHECK: vaf 1, 1, 30
	; CHECK: vaf 1, 1, 31			; CHECK: vaf 1, 1, 31
	; CHECK: vaf 24, 1, 0			; CHECK: vaf 24, 1, 0
	define <4 x i32> @sum_vecs1(<4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4, <4 x i32> %v5, <4 x i32> %v6, <4 x i32> %v7, <4 x i32> %v8, <4 x i32> %v9) {			define <4 x i32> @sum_vecs1(<4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3, <4 x i32> %v4, <4 x i32> %v5, <4 x i32> %v6, <4 x i32> %v7, <4 x i32> %v8, <4 x i32> %v9) {
	entry:			entry:
	%add0 = add <4 x i32> %v1, %v2			%add0 = add <4 x i32> %v1, %v2
	%add1 = add <4 x i32> %add0, %v3			%add1 = add <4 x i32> %add0, %v3
	%add2 = add <4 x i32> %add1, %v4			%add2 = add <4 x i32> %add1, %v4
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/zos-prologue-epilog.ll

; Test the generated function prologs/epilogs under XPLINK64 on z/OS		; Test the generated function prologs/epilogs under XPLINK64 on z/OS
;		;
; RUN: llc < %s -mtriple=s390x-ibm-zos -mcpu=z13 \| FileCheck --check-prefixes=CHECK64,CHECK %s		; RUN: llc < %s -mtriple=s390x-ibm-zos -mcpu=z13 \| FileCheck --check-prefixes=CHECK64,CHECK %s

; Test prolog/epilog for non-XPLEAF.		; Test prolog/epilog for non-XPLEAF.

; Small stack frame.		; Small stack frame.
; CHECK-LABEL: func0		; CHECK-LABEL: func0
; CHECK64: stmg 6, 7		; CHECK64: stmg 6, 7, 1872(4)
		; stmg instruction's displacement field must be 2064-dsa_size
		; as per ABI
		; CHECK64: aghi 4, -192

		; CHECK64: lg 7, 2072(4)
		; CHECK64: aghi 4, 192
		; CHECK64: b 2(7)
define void @func0() {		define void @func0() {
call i64 (i64) @fun(i64 10)		call i64 (i64) @fun(i64 10)
ret void		ret void
}		}

; Spill all GPR CSRs		; Spill all GPR CSRs
; CHECK-LABEL: func1		; CHECK-LABEL: func1
; CHECK64: stmg 6, 15		; CHECK64: stmg 6, 15, 1904(4)
		; CHECK64: aghi 4, -160

		; CHECK64: lmg 7, 15, 2072(4)
		; CHECK64: aghi 4, 160
		; CHECK64: b 2(7)
define void @func1(i64 *%ptr) {		define void @func1(i64 *%ptr) {
%l01 = load volatile i64, i64 *%ptr		%l01 = load volatile i64, i64 *%ptr
%l02 = load volatile i64, i64 *%ptr		%l02 = load volatile i64, i64 *%ptr
%l03 = load volatile i64, i64 *%ptr		%l03 = load volatile i64, i64 *%ptr
%l04 = load volatile i64, i64 *%ptr		%l04 = load volatile i64, i64 *%ptr
%l05 = load volatile i64, i64 *%ptr		%l05 = load volatile i64, i64 *%ptr
%l06 = load volatile i64, i64 *%ptr		%l06 = load volatile i64, i64 *%ptr
%l07 = load volatile i64, i64 *%ptr		%l07 = load volatile i64, i64 *%ptr
Show All 36 Lines	define void @func1(i64 *%ptr) {
store volatile i64 %add14, i64 *%ptr		store volatile i64 %add14, i64 *%ptr
store volatile i64 %add15, i64 *%ptr		store volatile i64 %add15, i64 *%ptr
ret void		ret void
}		}


; Spill all FPRs and VRs		; Spill all FPRs and VRs
; CHECK-LABEL: func2		; CHECK-LABEL: func2
		; CHECK64: stmg 6, 7, 1744(4)
		; CHECK64: aghi 4, -320
; CHECK64: std 15, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 15, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 14, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 14, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 13, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 13, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 12, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 12, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 11, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 11, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 10, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 10, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 9, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 9, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: std 8, {{[0-9]+}}(4) * 8-byte Folded Spill		; CHECK64: std 8, {{[0-9]+}}(4) * 8-byte Folded Spill
; CHECK64: vst 23, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 23, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 22, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 22, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 21, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 21, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 20, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 20, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 19, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 19, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 18, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 18, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 17, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 17, {{[0-9]+}}(4), 4 * 16-byte Folded Spill
; CHECK64: vst 16, {{[0-9]+}}(4), 4 * 16-byte Folded Spill		; CHECK64: vst 16, {{[0-9]+}}(4), 4 * 16-byte Folded Spill

		; CHECK64: ld 15, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 14, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 13, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 12, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 11, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 10, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 9, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: ld 8, {{[0-9]+}}(4) * 8-byte Folded Reload
		; CHECK64: vl 23, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 22, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 21, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 20, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 19, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 18, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 17, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: vl 16, {{[0-9]+}}(4), 4 * 16-byte Folded Reload
		; CHECK64: lg 7, 2072(4)
		; CHECK64: aghi 4, 320
		; CHECK64: b 2(7)

define void @func2(double %ptr, <2 x i64> %vec_ptr) {		define void @func2(double %ptr, <2 x i64> %vec_ptr) {
%l00 = load volatile double, double *%ptr		%l00 = load volatile double, double *%ptr
%l01 = load volatile double, double *%ptr		%l01 = load volatile double, double *%ptr
%l02 = load volatile double, double *%ptr		%l02 = load volatile double, double *%ptr
%l03 = load volatile double, double *%ptr		%l03 = load volatile double, double *%ptr
%l04 = load volatile double, double *%ptr		%l04 = load volatile double, double *%ptr
%l05 = load volatile double, double *%ptr		%l05 = load volatile double, double *%ptr
%l06 = load volatile double, double *%ptr		%l06 = load volatile double, double *%ptr
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	define void @func2(double %ptr, <2 x i64> %vec_ptr) {
store volatile <2 x i64> %vadd27, <2 x i64> *%vec_ptr		store volatile <2 x i64> %vadd27, <2 x i64> *%vec_ptr
store volatile <2 x i64> %vadd28, <2 x i64> *%vec_ptr		store volatile <2 x i64> %vadd28, <2 x i64> *%vec_ptr
store volatile <2 x i64> %vadd29, <2 x i64> *%vec_ptr		store volatile <2 x i64> %vadd29, <2 x i64> *%vec_ptr
store volatile <2 x i64> %vadd30, <2 x i64> *%vec_ptr		store volatile <2 x i64> %vadd30, <2 x i64> *%vec_ptr
store volatile <2 x i64> %vadd31, <2 x i64> *%vec_ptr		store volatile <2 x i64> %vadd31, <2 x i64> *%vec_ptr
ret void		ret void
}		}

declare i64 @fun(i64 %arg0)		; Big stack frame, force the use of agfi before stmg
		; despite not requiring stack extension routine.
		; CHECK64: agfi 4, -1040768
		; CHECK64: stmg 6, 7, 2064(4)
		; CHECK64: agfi 4, 1040768
		define void @func3() {
		%arr = alloca [130070 x i64], align 8
		%ptr = bitcast [130070 x i64]* %arr to i8*
		call i64 (i8) @fun1(i8 %ptr)
		ret void
		}

		; Requires the saving of r4 due to variable sized
		; object in stack frame. (Eg: VLA)
		; CHECK64: stmg 4, 8, 1856(4)
		; CHECK64: aghi 4, -192
		; CHECK64: lmg 4, 8, 2048(4)
		define i64 @func4(i64 %n) {
		%vla = alloca i64, i64 %n, align 8
		%call = call i64 @fun2(i64 %n, i64* nonnull %vla, i64* nonnull %vla)
		ret i64 %call
		}

		; Require saving of r4 and in addition, a displacement large enough
		; to force use of agfi before stmg.
		; CHECK64: lgr 0, 4
		; CHECK64: agfi 4, -1040192
		; CHECK64: stmg 4, 8, 2048(4)
		; CHECK64: lmg 4, 8, 2048(4)
		define i64 @func5(i64 %n) {
		%vla = alloca i64, i64 %n, align 8
		%arr = alloca [130000 x i64], align 8
		%ptr = bitcast [130000 x i64]* %arr to i64*
		%call = call i64 @fun2(i64 %n, i64* nonnull %vla, i64* %ptr)
		ret i64 %call
		}

		declare i64 @fun(i64 %arg0)
		declare i64 @fun1(i8* %ptr)
		declare i64 @fun2(i64 %n, i64* %arr0, i64* %arr1)