Download Raw Diff

Details

Reviewers

Commits

rG740f9d79ca7e: Arguments spilled on the stack before a function call may have alignment…
rL248786: Arguments spilled on the stack before a function call may have

Summary

Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

The x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Diff Detail

Repository: rL LLVM

Event Timeline

jketema updated this revision to Diff 33126.Aug 25 2015, 2:39 PM

jketema retitled this revision from to [Codegen] Ensure stack is properly aligned for call argument initialization.

jketema updated this object.

jketema added a reviewer: rnk.

jketema added a subscriber: llvm-commits.

Herald added a subscriber: qcolombet. · View Herald TranscriptAug 25 2015, 2:39 PM

rnk added inline comments.Aug 26 2015, 8:27 AM

include/llvm/CodeGen/CallingConvLower.h
204 ↗	(On Diff #33126)	I think this is really more like `MaxStackArgAlign`. It's the maximum alignment of arguments that ended up on the stack, right?
273 ↗	(On Diff #33126)	We should probably put some Doxygen here about what NextStackOffset really means.
274 ↗	(On Diff #33126)	Are you sure this is the right place to change? Isn't this method used to figure out where variadic argument packs start also? Shouldn't we not align StackOffset for that use case? I think it also gets used when we have a reserved stack frame (i.e. no dynamic alloca). As written, your code will overallocate some extra padding that isn't necessary.
275 ↗	(On Diff #33126)	Use RoundUpToAlignment.
406 ↗	(On Diff #33126)	RoundUpToAlignment here would be a nice cleanup.
409 ↗	(On Diff #33126)	This looks like `std::max(Align, MinStackAlign)` to me.

Thanks for your feedback! I'll address the easy bits, and have another look at the use of getNextStackOffset. I might need some help with that though, as I'm not very familiar with the code generation parts of llvm (still learning).

include/llvm/CodeGen/CallingConvLower.h
204 ↗	(On Diff #33126)	It was meant to be the minimum stack alignment I'll eventually need. I like your suggestion though.
274 ↗	(On Diff #33126)	I'll have another look at the use cases.

Addressed easy comments. getNextStackOffset concerns still to investigate (including adding Doxygen comment)

jketema added inline comments.Aug 26 2015, 3:57 PM

include/llvm/CodeGen/CallingConvLower.h
274–276 ↗	(On Diff #33261)	In the case of variadic arguments: can't we run into the same problems? If one of the arguments in the pack is a <4 x float> vector, then wouldn't there be cases where the X86 code generator uses and an movaps instruction to load the vector into one of the XMM registers? Or will it always used movups in that case? And similar for reserved stack frames?

Add Doxygen comment

Hi Reid,

Could you have another look at this? After having had a renewed look at the code, getNextStackOffset still seems the correct place for this change to me for the reasons I provided in the inline comments. However, if you think this really belongs somewhere else, would you be able to point me in the right direction?

Thanks,

Jeroen

Bump

rnk added inline comments.Sep 8 2015, 2:46 PM

include/llvm/CodeGen/CallingConvLower.h
274–279 ↗	(On Diff #33270)	Yes, but here's the 32-bit test case I'm imagining: void f(__m128 v, const char *f, ...) { va_list ap; va_start(ap, f); int d = va_arg(ap, int); } Do we have padding between 'f' and the variadic int parameter? I think the answer should be know, because if they were not variadic, they would not have padding between them.

bump

jketema added inline comments.Sep 15 2015, 1:35 PM

include/llvm/CodeGen/CallingConvLower.h
274–279 ↗	(On Diff #33270)	Sorry, I missed this before I bumped. Somehow the system didn't sent me a notification email. I don't see why this is a problem, my patch only changes the alignment of the whole frame, not the padding between the individual members; the computation for that stays unchanged. Hence, there will only be additional padding after the very last of the variadic parameters supplied.

jketema added inline comments.Sep 15 2015, 1:48 PM

include/llvm/CodeGen/CallingConvLower.h
274–279 ↗	(On Diff #33270)	Reading back all the comments, I think I see what you're getting at: What you're saying is that `getNextStackOffset` is also used to compute the offset of the variadic arguments with respect to the beginning of the stack frame? If that's the case, then the patch would indeed be wrong, and I think we would then need both the old variant and the new one I'm proposing and adapt every use accordingly. Or do you see a better solution?

rnk added inline comments.Sep 15 2015, 1:52 PM

include/llvm/CodeGen/CallingConvLower.h
274–279 ↗	(On Diff #33270)	Exactly, `getNextStackOffset` currently has more than one use. You should take the example C code above and turn it into a test case for lit to show that at least this use isn't changing. Given that there is more than one use, the I think we should introduce a new method called `getAlignedCallFrameSize`, update the appropriate call sites that use this in PEI, and declare victory. :)

Hi Reid,

I've introduced a getAlignedCallFrameSize method as suggested and changed call lowering in the X86 target appropriately. I've also introduced a test involving var args, as you suggested.

The X86 target also contains a call to getNextStackOffset in the code that determines whether tail call optimization can be performed. I'm not sure if that call should also be changed to getAlignedCallFrameSize. I'm also wondering whether getAlignedCallFrameSize should be introduced in other targets too.

Bump

lgtm

test/CodeGen/X86/win32-spill-xmm.ll
8 ↗	(On Diff #35166)	; CHECK: calll bar

This revision is now accepted and ready to land.Sep 28 2015, 11:40 AM

Closed by commit rL248786: Arguments spilled on the stack before a function call may have (authored by jketema). · Explain WhySep 29 2015, 3:23 AM

This revision was automatically updated to reflect the committed changes.

Thanks for all your feedback Reid!

pengfei mentioned this in D114536: [X86][MS] Fix the wrong alignment of vector variable arguments on Win32.Jan 16 2022, 7:52 AM

Diff 35955

llvm/trunk/include/llvm/CodeGen/CallingConvLower.h

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	private:
CallingConv::ID CallingConv;		CallingConv::ID CallingConv;
bool IsVarArg;		bool IsVarArg;
MachineFunction &MF;		MachineFunction &MF;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
SmallVectorImpl<CCValAssign> &Locs;		SmallVectorImpl<CCValAssign> &Locs;
LLVMContext &Context;		LLVMContext &Context;

unsigned StackOffset;		unsigned StackOffset;
		unsigned MaxStackArgAlign;
SmallVector<uint32_t, 16> UsedRegs;		SmallVector<uint32_t, 16> UsedRegs;
SmallVector<CCValAssign, 4> PendingLocs;		SmallVector<CCValAssign, 4> PendingLocs;

// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:		// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:
//		//
// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers		// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers
// tracking.		// tracking.
// Or, in another words it tracks byval parameters that are stored in		// Or, in another words it tracks byval parameters that are stored in
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	void addLoc(const CCValAssign &V) {
Locs.push_back(V);		Locs.push_back(V);
}		}

LLVMContext &getContext() const { return Context; }		LLVMContext &getContext() const { return Context; }
MachineFunction &getMachineFunction() const { return MF; }		MachineFunction &getMachineFunction() const { return MF; }
CallingConv::ID getCallingConv() const { return CallingConv; }		CallingConv::ID getCallingConv() const { return CallingConv; }
bool isVarArg() const { return IsVarArg; }		bool isVarArg() const { return IsVarArg; }

unsigned getNextStackOffset() const { return StackOffset; }		/// getNextStackOffset - Return the next stack offset such that all stack
		/// slots satisfy their alignment requirements.
		unsigned getNextStackOffset() const {
		return StackOffset;
		}

		/// getAlignedCallFrameSize - Return the size of the call frame needed to
		/// be able to store all arguments and such that the alignment requirement
		/// of each of the arguments is satisfied.
		unsigned getAlignedCallFrameSize() const {
		return RoundUpToAlignment(StackOffset, MaxStackArgAlign);
		}

/// isAllocated - Return true if the specified register (or an alias) is		/// isAllocated - Return true if the specified register (or an alias) is
/// allocated.		/// allocated.
bool isAllocated(unsigned Reg) const {		bool isAllocated(unsigned Reg) const {
return UsedRegs[Reg/32] & (1 << (Reg&31));		return UsedRegs[Reg/32] & (1 << (Reg&31));
}		}

/// AnalyzeFormalArguments - Analyze an array of argument values,		/// AnalyzeFormalArguments - Analyze an array of argument values,
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	unsigned AllocateReg(ArrayRef<MCPhysReg> Regs, const MCPhysReg *ShadowRegs) {
MarkAllocated(ShadowReg);		MarkAllocated(ShadowReg);
return Reg;		return Reg;
}		}

/// AllocateStack - Allocate a chunk of stack space with the specified size		/// AllocateStack - Allocate a chunk of stack space with the specified size
/// and alignment.		/// and alignment.
unsigned AllocateStack(unsigned Size, unsigned Align) {		unsigned AllocateStack(unsigned Size, unsigned Align) {
assert(Align && ((Align - 1) & Align) == 0); // Align is power of 2.		assert(Align && ((Align - 1) & Align) == 0); // Align is power of 2.
StackOffset = ((StackOffset + Align - 1) & ~(Align - 1));		StackOffset = RoundUpToAlignment(StackOffset, Align);
unsigned Result = StackOffset;		unsigned Result = StackOffset;
StackOffset += Size;		StackOffset += Size;
		MaxStackArgAlign = std::max(Align, MaxStackArgAlign);
MF.getFrameInfo()->ensureMaxAlignment(Align);		MF.getFrameInfo()->ensureMaxAlignment(Align);
return Result;		return Result;
}		}

/// Version of AllocateStack with extra register to be shadowed.		/// Version of AllocateStack with extra register to be shadowed.
unsigned AllocateStack(unsigned Size, unsigned Align, unsigned ShadowReg) {		unsigned AllocateStack(unsigned Size, unsigned Align, unsigned ShadowReg) {
MarkAllocated(ShadowReg);		MarkAllocated(ShadowReg);
return AllocateStack(Size, Align);		return AllocateStack(Size, Align);
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CallingConvLower.cpp

Show All 26 Lines

CCState::CCState(CallingConv::ID CC, bool isVarArg, MachineFunction &mf,		CCState::CCState(CallingConv::ID CC, bool isVarArg, MachineFunction &mf,
SmallVectorImpl<CCValAssign> &locs, LLVMContext &C)		SmallVectorImpl<CCValAssign> &locs, LLVMContext &C)
: CallingConv(CC), IsVarArg(isVarArg), MF(mf),		: CallingConv(CC), IsVarArg(isVarArg), MF(mf),
TRI(*MF.getSubtarget().getRegisterInfo()), Locs(locs), Context(C),		TRI(*MF.getSubtarget().getRegisterInfo()), Locs(locs), Context(C),
CallOrPrologue(Unknown) {		CallOrPrologue(Unknown) {
// No stack is used.		// No stack is used.
StackOffset = 0;		StackOffset = 0;
		MaxStackArgAlign = 1;

clearByValRegsInfo();		clearByValRegsInfo();
UsedRegs.resize((TRI.getNumRegs()+31)/32);		UsedRegs.resize((TRI.getNumRegs()+31)/32);
}		}

/// Allocate space on the stack large enough to pass an argument by value.		/// Allocate space on the stack large enough to pass an argument by value.
/// The size and alignment information of the argument is encoded in		/// The size and alignment information of the argument is encoded in
/// its parameter attribute.		/// its parameter attribute.
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	static bool isValueTypeInRegForCC(CallingConv::ID CC, MVT VT) {
if (CC == CallingConv::X86_VectorCall \|\| CC == CallingConv::X86_FastCall)		if (CC == CallingConv::X86_VectorCall \|\| CC == CallingConv::X86_FastCall)
return true;		return true;
return false;		return false;
}		}

void CCState::getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs,		void CCState::getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs,
MVT VT, CCAssignFn Fn) {		MVT VT, CCAssignFn Fn) {
unsigned SavedStackOffset = StackOffset;		unsigned SavedStackOffset = StackOffset;
		unsigned SavedMaxStackArgAlign = MaxStackArgAlign;
unsigned NumLocs = Locs.size();		unsigned NumLocs = Locs.size();

// Set the 'inreg' flag if it is used for this calling convention.		// Set the 'inreg' flag if it is used for this calling convention.
ISD::ArgFlagsTy Flags;		ISD::ArgFlagsTy Flags;
if (isValueTypeInRegForCC(CallingConv, VT))		if (isValueTypeInRegForCC(CallingConv, VT))
Flags.setInReg();		Flags.setInReg();

// Allocate something of this value type repeatedly until we get assigned a		// Allocate something of this value type repeatedly until we get assigned a
Show All 15 Lines	#endif
for (unsigned I = NumLocs, E = Locs.size(); I != E; ++I)		for (unsigned I = NumLocs, E = Locs.size(); I != E; ++I)
if (Locs[I].isRegLoc())		if (Locs[I].isRegLoc())
Regs.push_back(MCPhysReg(Locs[I].getLocReg()));		Regs.push_back(MCPhysReg(Locs[I].getLocReg()));

// Clear the assigned values and stack memory. We leave the registers marked		// Clear the assigned values and stack memory. We leave the registers marked
// as allocated so that future queries don't return the same registers, i.e.		// as allocated so that future queries don't return the same registers, i.e.
// when i64 and f64 are both passed in GPRs.		// when i64 and f64 are both passed in GPRs.
StackOffset = SavedStackOffset;		StackOffset = SavedStackOffset;
		MaxStackArgAlign = SavedMaxStackArgAlign;
Locs.resize(NumLocs);		Locs.resize(NumLocs);
}		}

void CCState::analyzeMustTailForwardedRegisters(		void CCState::analyzeMustTailForwardedRegisters(
SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,		SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,
CCAssignFn Fn) {		CCAssignFn Fn) {
// Oftentimes calling conventions will not user register parameters for		// Oftentimes calling conventions will not user register parameters for
// variadic functions, so we need to assume we're not variadic so that we get		// variadic functions, so we need to assume we're not variadic so that we get
Show All 14 Lines

llvm/trunk/lib/Target/X86/X86FastISel.cpp

Show First 20 Lines • Show All 2,900 Lines • ▼ Show 20 Lines	bool X86FastISel::fastLowerCall(CallLoweringInfo &CLI) {

// Allocate shadow area for Win64		// Allocate shadow area for Win64
if (IsWin64)		if (IsWin64)
CCInfo.AllocateStack(32, 8);		CCInfo.AllocateStack(32, 8);

CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);		CCInfo.AnalyzeCallOperands(OutVTs, OutFlags, CC_X86);

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = CCInfo.getNextStackOffset();		unsigned NumBytes = CCInfo.getAlignedCallFrameSize();

// Issue CALLSEQ_START		// Issue CALLSEQ_START
unsigned AdjStackDown = TII.getCallFrameSetupOpcode();		unsigned AdjStackDown = TII.getCallFrameSetupOpcode();
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))		BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(AdjStackDown))
.addImm(NumBytes).addImm(0);		.addImm(NumBytes).addImm(0);

// Walk the register/memloc assignments, inserting copies/loads.		// Walk the register/memloc assignments, inserting copies/loads.
const X86RegisterInfo *RegInfo = Subtarget->getRegisterInfo();		const X86RegisterInfo *RegInfo = Subtarget->getRegisterInfo();
▲ Show 20 Lines • Show All 636 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,013 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,

// Allocate shadow area for Win64		// Allocate shadow area for Win64
if (IsWin64)		if (IsWin64)
CCInfo.AllocateStack(32, 8);		CCInfo.AllocateStack(32, 8);

CCInfo.AnalyzeCallOperands(Outs, CC_X86);		CCInfo.AnalyzeCallOperands(Outs, CC_X86);

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = CCInfo.getNextStackOffset();		unsigned NumBytes = CCInfo.getAlignedCallFrameSize();
if (IsSibcall)		if (IsSibcall)
// This is a sibcall. The memory operands are available in caller's		// This is a sibcall. The memory operands are available in caller's
// own caller's stack.		// own caller's stack.
NumBytes = 0;		NumBytes = 0;
else if (MF.getTarget().Options.GuaranteedTailCallOpt &&		else if (MF.getTarget().Options.GuaranteedTailCallOpt &&
IsTailCallConvention(CallConv))		IsTailCallConvention(CallConv))
NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);		NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);

▲ Show 20 Lines • Show All 24,129 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/win32-spill-xmm.ll

				; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s \| FileCheck %s

				; Check proper alignment of spilled vector

				; CHECK-LABEL: spill_ok
				; CHECK: subl $32, %esp
				; CHECK: movaps %xmm3, (%esp)
				; CHECK: movl $0, 16(%esp)
				; CHECK: calll _bar
				define void @spill_ok(i32, <16 x float> *) {
				entry:
				%2 = alloca i32, i32 %0
				%3 = load <16 x float>, <16 x float> * %1, align 64
				tail call void @bar(<16 x float> %3, i32 0) nounwind
				ret void
				}

				declare void @bar(<16 x float> %a, i32 %b)

				; Check that proper alignment of spilled vector does not affect vargs

				; CHECK-LABEL: vargs_not_affected
				; CHECK: leal 28(%ebp), %eax
				define i32 @vargs_not_affected(<4 x float> %v, i8* %f, ...) {
				entry:
				%ap = alloca i8*, align 4
				%0 = bitcast i8** %ap to i8*
				call void @llvm.va_start(i8* %0)
				%argp.cur = load i8, i8* %ap, align 4
				%argp.next = getelementptr inbounds i8, i8* %argp.cur, i32 4
				store i8* %argp.next, i8** %ap, align 4
				%1 = bitcast i8* %argp.cur to i32*
				%2 = load i32, i32* %1, align 4
				call void @llvm.va_end(i8* %0)
				ret i32 %2
				}

				declare void @llvm.va_start(i8*)

				declare void @llvm.va_end(i8*)

This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Ensure stack is properly aligned for call argument initialization
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 35955

llvm/trunk/include/llvm/CodeGen/CallingConvLower.h

llvm/trunk/lib/CodeGen/CallingConvLower.cpp

llvm/trunk/lib/Target/X86/X86FastISel.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/win32-spill-xmm.ll

This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Ensure stack is properly aligned for call argument initializationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 35955

llvm/trunk/include/llvm/CodeGen/CallingConvLower.h

llvm/trunk/lib/CodeGen/CallingConvLower.cpp

llvm/trunk/lib/Target/X86/X86FastISel.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/win32-spill-xmm.ll

[Codegen] Ensure stack is properly aligned for call argument initialization
ClosedPublic