This is an archive of the discontinued LLVM Phabricator instance.

[mips] Optimize stack pointer adjustments.
ClosedPublic

Authored by sdardis on Jun 14 2016, 2:17 AM.

Download Raw Diff

Details

Reviewers

dsanders
vkalintiris

Summary

Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.

This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.

This change may break some naive stack unwinders.

Partially resolves PR/26291.

Thanks to David Chisnall for reporting the issue.

Diff Detail

Event Timeline

sdardis updated this revision to Diff 60659.Jun 14 2016, 2:17 AM

sdardis retitled this revision from to [mips] Optimize stack pointer adjustments..

sdardis updated this object.

sdardis added a reviewer: dsanders.

sdardis set the repository for this revision to rL LLVM.

sdardis added subscribers: llvm-commits, theraven.

Herald added a reviewer: vkalintiris. · View Herald TranscriptJun 14 2016, 2:18 AM

Herald added subscribers: sdardis, dsanders. · View Herald Transcript

I mis-spoke there, this partially resolves PR/26291. It fixes cases where functions with a large stack frame have a long prolog section for decrementing the stack pointer.

Added Nitesh, Mohit, Sagar, and Bhushan in case this requires changes to lldb.

Just one question about a fixme and a few minor nits.

lib/Target/Mips/MipsSEInstrInfo.cpp
446–456	Could you put braces around this now that it's multiple lines?
457–458	These suggestions are for another patch but just to mention them: MIPS32R6/MIPS64R6 can add the immediate without materializing it first using AUI/DAUI/DATI/DAHI. MIPS32R5/MIPS64R5 with MSA, and MIPS32R6/MIPS64R6 can improve on this using LSA/DLSA to add 17-20 bit immediates in two instructions instead of three as long as the amount is appropriately aligned (which is always true for 17-19 bit, and true on N32/N64 for 20-bit).
test/CodeGen/Mips/cstmaterialization/stack.ll
3–4	Could you add the N32 case?
30–32	Can you clarify what needs fixing here? Is it just the duplication or is there something else?
test/CodeGen/Mips/eh-dwarf-cfa.ll
16	Could you add a colon to each of these to reduce the chance of an accidental match on something like $f1 or 0xf1?
test/CodeGen/Mips/largeimm1.ll
9	This will match the 'f' in '.file ...' instead of the function label 'f:'

Addressed review comments

lib/Target/Mips/MipsSEInstrInfo.cpp
457–458	I hadn't thought about using (d)lsa to synthesize constants, but that's changes to MipsAnalyzeImmediate. This patch is to make a relatively tiny change to avoid some bad cases. I'll look at R6ifying the return sequence after R6ifying constant synthesis.
test/CodeGen/Mips/cstmaterialization/stack.ll
30–32	For mips64 we repeatedly synthesize a large offset of the current stack pointer: lui $5, 16 daddu $5, $sp, $5 sd $ra, 24($5) # 8-byte Folded Spill lui $ra, 16 daddu $ra, $sp, $ra sd $gp, 16($ra) # 8-byte Folded Spill The second spill could have re-used $5 with the offset of 16. This also occurs when those values are reloaded. Turns out I missed one of them.

LGTM with one more nit.

test/CodeGen/Mips/cstmaterialization/stack.ll
30–32	Thanks. In that case, can we phrase the comment in terms of an action to take in the future (e.g. 'fix the duplicated address generation').

This revision is now accepted and ready to land.Jun 14 2016, 5:34 AM

Thanks, changed in question to:

; FIXME:
; These are here to match other lui's used in address computations. We need to
; investigate why address computations are not CSE'd. Or implement it.

Committed as rL272666.

Revision Contents

Path

Size

lib/

Target/

Mips/

MCTargetDesc/

MipsABIInfo.h

1 line

MipsABIInfo.cpp

4 lines

MipsSEInstrInfo.cpp

16 lines

test/

CodeGen/

Mips/

cstmaterialization/

55 lines

14 lines

10 lines

28 lines

Diff 60666

lib/Target/Mips/MCTargetDesc/MipsABIInfo.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	public:
unsigned GetStackPtr() const;		unsigned GetStackPtr() const;
unsigned GetFramePtr() const;		unsigned GetFramePtr() const;
unsigned GetBasePtr() const;		unsigned GetBasePtr() const;
unsigned GetGlobalPtr() const;		unsigned GetGlobalPtr() const;
unsigned GetNullPtr() const;		unsigned GetNullPtr() const;
unsigned GetZeroReg() const;		unsigned GetZeroReg() const;
unsigned GetPtrAdduOp() const;		unsigned GetPtrAdduOp() const;
unsigned GetPtrAddiuOp() const;		unsigned GetPtrAddiuOp() const;
		unsigned GetPtrSubuOp() const;
unsigned GetPtrAndOp() const;		unsigned GetPtrAndOp() const;
unsigned GetGPRMoveOp() const;		unsigned GetGPRMoveOp() const;
inline bool ArePtrs64bit() const { return IsN64(); }		inline bool ArePtrs64bit() const { return IsN64(); }
inline bool AreGprs64bit() const { return IsN32() \|\| IsN64(); }		inline bool AreGprs64bit() const { return IsN32() \|\| IsN64(); }

unsigned GetEhDataReg(unsigned I) const;		unsigned GetEhDataReg(unsigned I) const;
};		};
}		}

#endif		#endif

lib/Target/Mips/MCTargetDesc/MipsABIInfo.cpp

	Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	unsigned MipsABIInfo::GetPtrAdduOp() const {			unsigned MipsABIInfo::GetPtrAdduOp() const {
	return ArePtrs64bit() ? Mips::DADDu : Mips::ADDu;			return ArePtrs64bit() ? Mips::DADDu : Mips::ADDu;
	}			}

	unsigned MipsABIInfo::GetPtrAddiuOp() const {			unsigned MipsABIInfo::GetPtrAddiuOp() const {
	return ArePtrs64bit() ? Mips::DADDiu : Mips::ADDiu;			return ArePtrs64bit() ? Mips::DADDiu : Mips::ADDiu;
	}			}

				unsigned MipsABIInfo::GetPtrSubuOp() const {
				return ArePtrs64bit() ? Mips::DSUBu : Mips::SUBu;
				}

	unsigned MipsABIInfo::GetPtrAndOp() const {			unsigned MipsABIInfo::GetPtrAndOp() const {
	return ArePtrs64bit() ? Mips::AND64 : Mips::AND;			return ArePtrs64bit() ? Mips::AND64 : Mips::AND;
	}			}

	unsigned MipsABIInfo::GetGPRMoveOp() const {			unsigned MipsABIInfo::GetGPRMoveOp() const {
	return ArePtrs64bit() ? Mips::OR64 : Mips::OR;			return ArePtrs64bit() ? Mips::OR64 : Mips::OR;
	}			}

	Show All 11 Lines

lib/Target/Mips/MipsSEInstrInfo.cpp

	Show All 14 Lines
	#include "InstPrinter/MipsInstPrinter.h"			#include "InstPrinter/MipsInstPrinter.h"
	#include "MipsAnalyzeImmediate.h"			#include "MipsAnalyzeImmediate.h"
	#include "MipsMachineFunction.h"			#include "MipsMachineFunction.h"
	#include "MipsTargetMachine.h"			#include "MipsTargetMachine.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/CodeGen/MachineInstrBuilder.h"			#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/MathExtras.h"
	#include "llvm/Support/TargetRegistry.h"			#include "llvm/Support/TargetRegistry.h"

	using namespace llvm;			using namespace llvm;

	MipsSEInstrInfo::MipsSEInstrInfo(const MipsSubtarget &STI)			MipsSEInstrInfo::MipsSEInstrInfo(const MipsSubtarget &STI)
	: MipsInstrInfo(STI, STI.getRelocationModel() == Reloc::PIC_ ? Mips::B			: MipsInstrInfo(STI, STI.getRelocationModel() == Reloc::PIC_ ? Mips::B
	: Mips::J),			: Mips::J),
	RI() {}			RI() {}
	▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines
	}			}

	/// Adjust SP by Amount bytes.			/// Adjust SP by Amount bytes.
	void MipsSEInstrInfo::adjustStackPtr(unsigned SP, int64_t Amount,			void MipsSEInstrInfo::adjustStackPtr(unsigned SP, int64_t Amount,
	MachineBasicBlock &MBB,			MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I) const {			MachineBasicBlock::iterator I) const {
	MipsABIInfo ABI = Subtarget.getABI();			MipsABIInfo ABI = Subtarget.getABI();
	DebugLoc DL;			DebugLoc DL;
	unsigned ADDu = ABI.GetPtrAdduOp();
	unsigned ADDiu = ABI.GetPtrAddiuOp();			unsigned ADDiu = ABI.GetPtrAddiuOp();

	if (Amount == 0)			if (Amount == 0)
	return;			return;

	if (isInt<16>(Amount))// addi sp, sp, amount			if (isInt<16>(Amount)) {
				// addi sp, sp, amount
	BuildMI(MBB, I, DL, get(ADDiu), SP).addReg(SP).addImm(Amount);			BuildMI(MBB, I, DL, get(ADDiu), SP).addReg(SP).addImm(Amount);
	else { // Expand immediate that doesn't fit in 16-bit.			} else {
				// For numbers which are not 16bit integers we synthesize Amount inline
				// then add or subtract it from sp.
				unsigned Opc = ABI.GetPtrAdduOp();
				if (Amount < 0) {
				Opc = ABI.GetPtrSubuOp();
				Amount = -Amount;
				}
				dsandersUnsubmitted Not Done Reply Inline Actions Could you put braces around this now that it's multiple lines? dsanders: Could you put braces around this now that it's multiple lines?
	unsigned Reg = loadImmediate(Amount, MBB, I, DL, nullptr);			unsigned Reg = loadImmediate(Amount, MBB, I, DL, nullptr);
	BuildMI(MBB, I, DL, get(ADDu), SP).addReg(SP).addReg(Reg, RegState::Kill);			BuildMI(MBB, I, DL, get(Opc), SP).addReg(SP).addReg(Reg, RegState::Kill);
				dsandersUnsubmitted Not Done Reply Inline Actions These suggestions are for another patch but just to mention them: MIPS32R6/MIPS64R6 can add the immediate without materializing it first using AUI/DAUI/DATI/DAHI. MIPS32R5/MIPS64R5 with MSA, and MIPS32R6/MIPS64R6 can improve on this using LSA/DLSA to add 17-20 bit immediates in two instructions instead of three as long as the amount is appropriately aligned (which is always true for 17-19 bit, and true on N32/N64 for 20-bit). dsanders: These suggestions are for another patch but just to mention them: * MIPS32R6/MIPS64R6 can add…
				sdardisAuthorUnsubmitted Not Done Reply Inline Actions I hadn't thought about using (d)lsa to synthesize constants, but that's changes to MipsAnalyzeImmediate. This patch is to make a relatively tiny change to avoid some bad cases. I'll look at R6ifying the return sequence after R6ifying constant synthesis. sdardis: I hadn't thought about using (d)lsa to synthesize constants, but that's changes to…
	}			}
	}			}

	/// This function generates the sequence of instructions needed to get the			/// This function generates the sequence of instructions needed to get the
	/// result of adding register REG and immediate IMM.			/// result of adding register REG and immediate IMM.
	unsigned MipsSEInstrInfo::loadImmediate(int64_t Imm, MachineBasicBlock &MBB,			unsigned MipsSEInstrInfo::loadImmediate(int64_t Imm, MachineBasicBlock &MBB,
	MachineBasicBlock::iterator II,			MachineBasicBlock::iterator II,
	const DebugLoc &DL,			const DebugLoc &DL,
	▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

test/CodeGen/Mips/cstmaterialization/stack.ll

This file was added.

				; RUN: llc -march=mipsel -mcpu=mips32 < %s \| FileCheck %s -check-prefix=CHECK-MIPS32
				; RUN: llc -march=mips64el -mcpu=mips64 < %s \| \
				; RUN: FileCheck %s -check-prefix=CHECK-MIPS64
				; RUN: llc -march=mipsel -mcpu=mips64 -target-abi n32 < %s \| FileCheck %s -check-prefix=CHECK-MIPSN32
				dsandersUnsubmitted Done Reply Inline Actions Could you add the N32 case? dsanders: Could you add the N32 case?

				; Test that the expansion of ADJCALLSTACKDOWN and ADJCALLSTACKUP generate
				; (d)subu and (d)addu rather than just (d)addu. The (d)subu sequences are
				; generally shorter as the constant that has to be materialized is smaller.

				define i32 @main() {
				entry:
				%z = alloca [1048576 x i8], align 1
				%arraydecay = getelementptr inbounds [1048576 x i8], [1048576 x i8]* %z, i32 0, i32 0
				%call = call i32 @foo(i8* %arraydecay)
				ret i32 0
				; CHECK-LABEL: main

				; CHECK-MIPS32: lui $[[R0:[0-9]+]], 16
				; CHECK-MIPS32: addiu $[[R0]], $[[R0]], 24
				; CHECK-MIPS32: subu $sp, $sp, $[[R0]]

				; CHECK-MIPS32: lui $[[R1:[0-9]+]], 16
				; CHECK-MIPS32: addiu $[[R1]], $[[R1]], 24
				; CHECK-MIPS32: addu $sp, $sp, $[[R1]]

				; CHECK-MIPS64: lui $[[R0:[0-9]+]], 1
				; CHECK-MIPS64: daddiu $[[R0]], $[[R0]], 32
				; CHECK-MIPS64: dsubu $sp, $sp, $[[R0]]

				; FIXME:
				; These are here to match other lui's used in offset computations and they're
				; also duplicated.
				dsandersUnsubmitted Not Done Reply Inline Actions Can you clarify what needs fixing here? Is it just the duplication or is there something else? dsanders: Can you clarify what needs fixing here? Is it just the duplication or is there something else?
				sdardisAuthorUnsubmitted Not Done Reply Inline Actions For mips64 we repeatedly synthesize a large offset of the current stack pointer: lui $5, 16 daddu $5, $sp, $5 sd $ra, 24($5) # 8-byte Folded Spill lui $ra, 16 daddu $ra, $sp, $ra sd $gp, 16($ra) # 8-byte Folded Spill The second spill could have re-used $5 with the offset of 16. This also occurs when those values are reloaded. Turns out I missed one of them. sdardis: For mips64 we repeatedly synthesize a large offset of the current stack pointer: lui $5…
				dsandersUnsubmitted Not Done Reply Inline Actions Thanks. In that case, can we phrase the comment in terms of an action to take in the future (e.g. 'fix the duplicated address generation'). dsanders: Thanks. In that case, can we phrase the comment in terms of an action to take in the future (e.

				; CHECK-MIPS64: lui
				; CHECK-MIPS64: lui
				; CHECK-MIPS64: lui
				; CHECK-MIPS64: lui

				; CHECK-MIPS64: lui $[[R1:[0-9]+]], 16
				; CHECK-MIPS64: daddiu $[[R1]], $[[R1]], 32
				; CHECK-MIPS64: daddu $sp, $sp, $[[R1]]

				; CHECK-MIPSN32: lui $[[R0:[0-9]+]], 16
				; CHECK-MIPSN32: addiu $[[R0]], $[[R0]], 16
				; CHECK-MIPSN32: subu $sp, $sp, $[[R0]]

				; CHECK-MIPSN32: lui $[[R1:[0-9]+]], 16
				; CHECK-MIPSN32: addiu $[[R1]], $[[R1]], 16
				; CHECK-MIPSN32: addu $sp, $sp, $[[R1]]


				}

				declare i32 @foo(i8*)

test/CodeGen/Mips/eh-dwarf-cfa.ll

	; RUN: llc -march=mipsel -mcpu=mips32 < %s \| FileCheck %s			; RUN: llc -march=mipsel -mcpu=mips32 < %s \| FileCheck %s
	; RUN: llc -march=mips64el -mcpu=mips4 < %s \| \			; RUN: llc -march=mips64el -mcpu=mips4 < %s \| \
	; RUN: FileCheck %s -check-prefix=CHECK-MIPS64			; RUN: FileCheck %s -check-prefix=CHECK-MIPS64
	; RUN: llc -march=mips64el -mcpu=mips64 < %s \| \			; RUN: llc -march=mips64el -mcpu=mips64 < %s \| \
	; RUN: FileCheck %s -check-prefix=CHECK-MIPS64			; RUN: FileCheck %s -check-prefix=CHECK-MIPS64

	declare i8* @llvm.eh.dwarf.cfa(i32) nounwind			declare i8* @llvm.eh.dwarf.cfa(i32) nounwind
	declare i8* @llvm.frameaddress(i32) nounwind readnone			declare i8* @llvm.frameaddress(i32) nounwind readnone

	define i8* @f1() nounwind {			define i8* @f1() nounwind {
	entry:			entry:
	%x = alloca [32 x i8], align 1			%x = alloca [32 x i8], align 1
	%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)			%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)
	ret i8* %0			ret i8* %0

				; CHECK-LABEL: f1:
				dsandersUnsubmitted Done Reply Inline Actions Could you add a colon to each of these to reduce the chance of an accidental match on something like $f1 or 0xf1? dsanders: Could you add a colon to each of these to reduce the chance of an accidental match on something…

	; CHECK: addiu $sp, $sp, -32			; CHECK: addiu $sp, $sp, -32
	; CHECK: addiu $2, $sp, 32			; CHECK: addiu $2, $sp, 32
	}			}


	define i8* @f2() nounwind {			define i8* @f2() nounwind {
	entry:			entry:
	%x = alloca [65536 x i8], align 1			%x = alloca [65536 x i8], align 1
	%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)			%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)
	ret i8* %0			ret i8* %0

				; CHECK-LABEL: f2:

	; check stack size (65536 + 8)			; check stack size (65536 + 8)
	; CHECK: lui $[[R0:[a-z0-9]+]], 65535			; CHECK: lui $[[R0:[a-z0-9]+]], 1
	; CHECK: addiu $[[R0]], $[[R0]], -8			; CHECK: addiu $[[R0]], $[[R0]], 8
	; CHECK: addu $sp, $sp, $[[R0]]			; CHECK: subu $sp, $sp, $[[R0]]

	; check return value ($sp + stack size)			; check return value ($sp + stack size)
	; CHECK: lui $[[R1:[a-z0-9]+]], 1			; CHECK: lui $[[R1:[a-z0-9]+]], 1
	; CHECK: addu $[[R1]], $sp, $[[R1]]			; CHECK: addu $[[R1]], $sp, $[[R1]]
	; CHECK: addiu $2, $[[R1]], 8			; CHECK: addiu $2, $[[R1]], 8
	}			}


	define i32 @f3() nounwind {			define i32 @f3() nounwind {
	entry:			entry:
	%x = alloca [32 x i8], align 1			%x = alloca [32 x i8], align 1
	%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)			%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)
	%1 = ptrtoint i8* %0 to i32			%1 = ptrtoint i8* %0 to i32
	%2 = call i8* @llvm.frameaddress(i32 0)			%2 = call i8* @llvm.frameaddress(i32 0)
	%3 = ptrtoint i8* %2 to i32			%3 = ptrtoint i8* %2 to i32
	%add = add i32 %1, %3			%add = add i32 %1, %3
	ret i32 %add			ret i32 %add

				; CHECK-LABEL: f3:

	; CHECK: addiu $sp, $sp, -40			; CHECK: addiu $sp, $sp, -40

	; check return value ($fp + stack size + $fp)			; check return value ($fp + stack size + $fp)
	; CHECK: addiu $[[R0:[a-z0-9]+]], $fp, 40			; CHECK: addiu $[[R0:[a-z0-9]+]], $fp, 40
	; CHECK: addu $2, $[[R0]], $fp			; CHECK: addu $2, $[[R0]], $fp
	}			}


	define i8* @f4() nounwind {			define i8* @f4() nounwind {
	entry:			entry:
	%x = alloca [32 x i8], align 1			%x = alloca [32 x i8], align 1
	%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)			%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)
	ret i8* %0			ret i8* %0

				; CHECK-LABEL: f4:

	; CHECK-MIPS64: daddiu $sp, $sp, -32			; CHECK-MIPS64: daddiu $sp, $sp, -32
	; CHECK-MIPS64: daddiu $2, $sp, 32			; CHECK-MIPS64: daddiu $2, $sp, 32
	}			}

test/CodeGen/Mips/largeimm1.ll

	; RUN: llc -march=mipsel -relocation-model=pic < %s \| FileCheck %s			; RUN: llc -march=mipsel -relocation-model=pic < %s \| FileCheck %s

	; CHECK: lui ${{[0-9]+}}, 49152
	; CHECK: lui ${{[0-9]+}}, 16384
	define void @f() nounwind {			define void @f() nounwind {
	entry:			entry:
	%a1 = alloca [1073741824 x i8], align 1			%a1 = alloca [1073741824 x i8], align 1
	%arrayidx = getelementptr inbounds [1073741824 x i8], [1073741824 x i8]* %a1, i32 0, i32 1048676			%arrayidx = getelementptr inbounds [1073741824 x i8], [1073741824 x i8]* %a1, i32 0, i32 1048676
	call void @f2(i8* %arrayidx) nounwind			call void @f2(i8* %arrayidx) nounwind
	ret void			ret void
				; CHECK-LABEL: f:
				dsandersUnsubmitted Done Reply Inline Actions This will match the 'f' in '.file ...' instead of the function label 'f:' dsanders: This will match the 'f' in '.file ...' instead of the function label 'f:'

				; CHECK: lui $[[R0:[a-z0-9]+]], 16384
				; CHECK: addiu $[[R1:[a-z0-9]+]], $[[R0]], 24
				; CHECK: subu $sp, $sp, $[[R1]]

				; CHECK: lui $[[R2:[a-z0-9]+]], 16384
				; CHECK: addu ${{[0-9]+}}, $sp, $[[R2]]
	}			}

	declare void @f2(i8*)			declare void @f2(i8*)

test/CodeGen/Mips/largeimmprinting.ll

	; RUN: llc -march=mipsel -relocation-model=pic < %s \| FileCheck %s -check-prefix=32			; RUN: llc -march=mipsel -relocation-model=pic < %s \| FileCheck %s -check-prefix=32
	; RUN: llc -march=mips64el -mcpu=mips4 -target-abi=n64 -relocation-model=pic < %s \| \			; RUN: llc -march=mips64el -mcpu=mips4 -target-abi=n64 -relocation-model=pic < %s \| \
	; RUN: FileCheck %s -check-prefix=64			; RUN: FileCheck %s -check-prefix=64
	; RUN: llc -march=mips64el -mcpu=mips64 -target-abi=n64 -relocation-model=pic < %s \| \			; RUN: llc -march=mips64el -mcpu=mips64 -target-abi=n64 -relocation-model=pic < %s \| \
	; RUN: FileCheck %s -check-prefix=64			; RUN: FileCheck %s -check-prefix=64

	%struct.S1 = type { [65536 x i8] }			%struct.S1 = type { [65536 x i8] }

	@s1 = external global %struct.S1			@s1 = external global %struct.S1

	define void @f() nounwind {			define void @f() nounwind {
	entry:			entry:
	; 32: lui $[[R0:[0-9]+]], 65535			; 32: lui $[[R0:[0-9]+]], 1
	; 32: addiu $[[R0]], $[[R0]], -24			; 32: addiu $[[R0]], $[[R0]], 24
	; 32: addu $sp, $sp, $[[R0]]			; 32: subu $sp, $sp, $[[R0]]
	; 32: lui $[[R1:[0-9]+]], 1			; 32: lui $[[R1:[0-9]+]], 1
	; 32: addu $[[R1]], $sp, $[[R1]]			; 32: addu $[[R1]], $sp, $[[R1]]
	; 32: sw $ra, 20($[[R1]])			; 32: sw $ra, 20($[[R1]])
	; 64: daddiu $[[R0:[0-9]+]], $zero, 1
	; 64: dsll $[[R0]], $[[R0]], 48			; 64: lui $[[R0:[0-9]+]], 1
	; 64: daddiu $[[R0]], $[[R0]], -1			; 64: daddiu $[[R0]], $[[R0]], 32
	; 64: dsll $[[R0]], $[[R0]], 16			; 64: dsubu $sp, $sp, $[[R0]]
	; 64: daddiu $[[R0]], $[[R0]], -32
	; 64: daddu $sp, $sp, $[[R0]]
	; 64: lui $[[R1:[0-9]+]], 1			; 64: lui $[[R1:[0-9]+]], 1
	; 64: daddu $[[R1]], $sp, $[[R1]]			; 64: daddu $[[R1]], $sp, $[[R1]]
	; 64: sd $ra, 24($[[R1]])			; 64: sd $ra, 24($[[R1]])

	%agg.tmp = alloca %struct.S1, align 1			%agg.tmp = alloca %struct.S1, align 1
	%tmp = getelementptr inbounds %struct.S1, %struct.S1* %agg.tmp, i32 0, i32 0, i32 0			%tmp = getelementptr inbounds %struct.S1, %struct.S1* %agg.tmp, i32 0, i32 0, i32 0
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* getelementptr inbounds (%struct.S1, %struct.S1* @s1, i32 0, i32 0, i32 0), i32 65536, i32 1, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* getelementptr inbounds (%struct.S1, %struct.S1* @s1, i32 0, i32 0, i32 0), i32 65536, i32 1, i1 false)
	call void @f2(%struct.S1* byval %agg.tmp) nounwind			call void @f2(%struct.S1* byval %agg.tmp) nounwind
	ret void			ret void
	}			}

	declare void @f2(%struct.S1* byval)			declare void @f2(%struct.S1* byval)

	declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind