Download Raw Diff

Details

Reviewers

Commits

rG740f9d79ca7e: Arguments spilled on the stack before a function call may have alignment…
rL248786: Arguments spilled on the stack before a function call may have

Summary

Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

The x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Diff Detail

Event Timeline

jketema updated this revision to Diff 33126.Aug 25 2015, 2:39 PM

jketema retitled this revision from to [Codegen] Ensure stack is properly aligned for call argument initialization.

jketema updated this object.

jketema added a reviewer: rnk.

jketema added a subscriber: llvm-commits.

Herald added a subscriber: qcolombet. · View Herald TranscriptAug 25 2015, 2:39 PM

rnk added inline comments.Aug 26 2015, 8:27 AM

include/llvm/CodeGen/CallingConvLower.h
204	I think this is really more like `MaxStackArgAlign`. It's the maximum alignment of arguments that ended up on the stack, right?
273	We should probably put some Doxygen here about what NextStackOffset really means.
274	Are you sure this is the right place to change? Isn't this method used to figure out where variadic argument packs start also? Shouldn't we not align StackOffset for that use case? I think it also gets used when we have a reserved stack frame (i.e. no dynamic alloca). As written, your code will overallocate some extra padding that isn't necessary.
275	Use RoundUpToAlignment.
406	RoundUpToAlignment here would be a nice cleanup.
409	This looks like `std::max(Align, MinStackAlign)` to me.

Thanks for your feedback! I'll address the easy bits, and have another look at the use of getNextStackOffset. I might need some help with that though, as I'm not very familiar with the code generation parts of llvm (still learning).

include/llvm/CodeGen/CallingConvLower.h
204	It was meant to be the minimum stack alignment I'll eventually need. I like your suggestion though.
274	I'll have another look at the use cases.

Addressed easy comments. getNextStackOffset concerns still to investigate (including adding Doxygen comment)

jketema added inline comments.Aug 26 2015, 3:57 PM

include/llvm/CodeGen/CallingConvLower.h
274–277	In the case of variadic arguments: can't we run into the same problems? If one of the arguments in the pack is a <4 x float> vector, then wouldn't there be cases where the X86 code generator uses and an movaps instruction to load the vector into one of the XMM registers? Or will it always used movups in that case? And similar for reserved stack frames?

Add Doxygen comment

Hi Reid,

Could you have another look at this? After having had a renewed look at the code, getNextStackOffset still seems the correct place for this change to me for the reasons I provided in the inline comments. However, if you think this really belongs somewhere else, would you be able to point me in the right direction?

Thanks,

Jeroen

Bump

rnk added inline comments.Sep 8 2015, 2:46 PM

include/llvm/CodeGen/CallingConvLower.h
274–277	Yes, but here's the 32-bit test case I'm imagining: void f(__m128 v, const char *f, ...) { va_list ap; va_start(ap, f); int d = va_arg(ap, int); } Do we have padding between 'f' and the variadic int parameter? I think the answer should be know, because if they were not variadic, they would not have padding between them.

bump

jketema added inline comments.Sep 15 2015, 1:35 PM

include/llvm/CodeGen/CallingConvLower.h
274–277	Sorry, I missed this before I bumped. Somehow the system didn't sent me a notification email. I don't see why this is a problem, my patch only changes the alignment of the whole frame, not the padding between the individual members; the computation for that stays unchanged. Hence, there will only be additional padding after the very last of the variadic parameters supplied.

jketema added inline comments.Sep 15 2015, 1:48 PM

include/llvm/CodeGen/CallingConvLower.h
274–277	Reading back all the comments, I think I see what you're getting at: What you're saying is that `getNextStackOffset` is also used to compute the offset of the variadic arguments with respect to the beginning of the stack frame? If that's the case, then the patch would indeed be wrong, and I think we would then need both the old variant and the new one I'm proposing and adapt every use accordingly. Or do you see a better solution?

rnk added inline comments.Sep 15 2015, 1:52 PM

include/llvm/CodeGen/CallingConvLower.h
274–277	Exactly, `getNextStackOffset` currently has more than one use. You should take the example C code above and turn it into a test case for lit to show that at least this use isn't changing. Given that there is more than one use, the I think we should introduce a new method called `getAlignedCallFrameSize`, update the appropriate call sites that use this in PEI, and declare victory. :)

Hi Reid,

I've introduced a getAlignedCallFrameSize method as suggested and changed call lowering in the X86 target appropriately. I've also introduced a test involving var args, as you suggested.

The X86 target also contains a call to getNextStackOffset in the code that determines whether tail call optimization can be performed. I'm not sure if that call should also be changed to getAlignedCallFrameSize. I'm also wondering whether getAlignedCallFrameSize should be introduced in other targets too.

Bump

lgtm

test/CodeGen/X86/win32-spill-xmm.ll
9	; CHECK: calll bar

This revision is now accepted and ready to land.Sep 28 2015, 11:40 AM

Closed by commit rL248786: Arguments spilled on the stack before a function call may have (authored by jketema). · Explain WhySep 29 2015, 3:23 AM

This revision was automatically updated to reflect the committed changes.

Thanks for all your feedback Reid!

pengfei mentioned this in D114536: [X86][MS] Fix the wrong alignment of vector variable arguments on Win32.Jan 16 2022, 7:52 AM

Diff 33126

include/llvm/CodeGen/CallingConvLower.h

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	private:
CallingConv::ID CallingConv;		CallingConv::ID CallingConv;
bool IsVarArg;		bool IsVarArg;
MachineFunction &MF;		MachineFunction &MF;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
SmallVectorImpl<CCValAssign> &Locs;		SmallVectorImpl<CCValAssign> &Locs;
LLVMContext &Context;		LLVMContext &Context;

unsigned StackOffset;		unsigned StackOffset;
		unsigned MinStackAlign;
		rnkUnsubmitted Done Reply Inline Actions I think this is really more like `MaxStackArgAlign`. It's the maximum alignment of arguments that ended up on the stack, right? rnk: I think this is really more like `MaxStackArgAlign`. It's the maximum alignment of arguments…
		jketemaAuthorUnsubmitted Done Reply Inline Actions It was meant to be the minimum stack alignment I'll eventually need. I like your suggestion though. jketema: It was meant to be the minimum stack alignment I'll eventually need. I like your suggestion…
SmallVector<uint32_t, 16> UsedRegs;		SmallVector<uint32_t, 16> UsedRegs;
SmallVector<CCValAssign, 4> PendingLocs;		SmallVector<CCValAssign, 4> PendingLocs;

// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:		// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:
//		//
// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers		// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers
// tracking.		// tracking.
// Or, in another words it tracks byval parameters that are stored in		// Or, in another words it tracks byval parameters that are stored in
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	public:
void addLoc(const CCValAssign &V) {		void addLoc(const CCValAssign &V) {
Locs.push_back(V);		Locs.push_back(V);
}		}

LLVMContext &getContext() const { return Context; }		LLVMContext &getContext() const { return Context; }
MachineFunction &getMachineFunction() const { return MF; }		MachineFunction &getMachineFunction() const { return MF; }
CallingConv::ID getCallingConv() const { return CallingConv; }		CallingConv::ID getCallingConv() const { return CallingConv; }
bool isVarArg() const { return IsVarArg; }		bool isVarArg() const { return IsVarArg; }

		rnkUnsubmitted Done Reply Inline Actions We should probably put some Doxygen here about what NextStackOffset really means. rnk: We should probably put some Doxygen here about what NextStackOffset really means.
unsigned getNextStackOffset() const { return StackOffset; }		unsigned getNextStackOffset() const {
		rnkUnsubmitted Not Done Reply Inline Actions Are you sure this is the right place to change? Isn't this method used to figure out where variadic argument packs start also? Shouldn't we not align StackOffset for that use case? I think it also gets used when we have a reserved stack frame (i.e. no dynamic alloca). As written, your code will overallocate some extra padding that isn't necessary. rnk: Are you sure this is the right place to change? Isn't this method used to figure out where…
		jketemaAuthorUnsubmitted Not Done Reply Inline Actions I'll have another look at the use cases. jketema: I'll have another look at the use cases.
		return ((StackOffset + MinStackAlign - 1) & ~(MinStackAlign - 1));
		rnkUnsubmitted Done Reply Inline Actions Use RoundUpToAlignment. rnk: Use RoundUpToAlignment.
		}

		jketemaAuthorUnsubmitted Not Done Reply Inline Actions In the case of variadic arguments: can't we run into the same problems? If one of the arguments in the pack is a <4 x float> vector, then wouldn't there be cases where the X86 code generator uses and an movaps instruction to load the vector into one of the XMM registers? Or will it always used movups in that case? And similar for reserved stack frames? jketema: In the case of variadic arguments: can't we run into the same problems? If one of the arguments…
		rnkUnsubmitted Not Done Reply Inline Actions Yes, but here's the 32-bit test case I'm imagining: void f(__m128 v, const char f, ...) { va_list ap; va_start(ap, f); int d = va_arg(ap, int); } Do we have padding between 'f' and the variadic int parameter? I think the answer should be know, because if they were not variadic, they would not have padding between them. rnk:* Yes, but here's the 32-bit test case I'm imagining: void f(__m128 v, const char *f, ...) {…
		jketemaAuthorUnsubmitted Not Done Reply Inline Actions Sorry, I missed this before I bumped. Somehow the system didn't sent me a notification email. I don't see why this is a problem, my patch only changes the alignment of the whole frame, not the padding between the individual members; the computation for that stays unchanged. Hence, there will only be additional padding after the very last of the variadic parameters supplied. jketema: Sorry, I missed this before I bumped. Somehow the system didn't sent me a notification email.
		jketemaAuthorUnsubmitted Not Done Reply Inline Actions Reading back all the comments, I think I see what you're getting at: What you're saying is that `getNextStackOffset` is also used to compute the offset of the variadic arguments with respect to the beginning of the stack frame? If that's the case, then the patch would indeed be wrong, and I think we would then need both the old variant and the new one I'm proposing and adapt every use accordingly. Or do you see a better solution? jketema: Reading back all the comments, I think I see what you're getting at: What you're saying is that…
		rnkUnsubmitted Not Done Reply Inline Actions Exactly, `getNextStackOffset` currently has more than one use. You should take the example C code above and turn it into a test case for lit to show that at least this use isn't changing. Given that there is more than one use, the I think we should introduce a new method called `getAlignedCallFrameSize`, update the appropriate call sites that use this in PEI, and declare victory. :) rnk: Exactly, `getNextStackOffset` currently has more than one use. You should take the example C…
/// isAllocated - Return true if the specified register (or an alias) is		/// isAllocated - Return true if the specified register (or an alias) is
/// allocated.		/// allocated.
bool isAllocated(unsigned Reg) const {		bool isAllocated(unsigned Reg) const {
return UsedRegs[Reg/32] & (1 << (Reg&31));		return UsedRegs[Reg/32] & (1 << (Reg&31));
}		}

/// AnalyzeFormalArguments - Analyze an array of argument values,		/// AnalyzeFormalArguments - Analyze an array of argument values,
/// incorporating info about the formals into this state.		/// incorporating info about the formals into this state.
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	unsigned AllocateReg(ArrayRef<MCPhysReg> Regs, const MCPhysReg *ShadowRegs) {
MarkAllocated(ShadowReg);		MarkAllocated(ShadowReg);
return Reg;		return Reg;
}		}

/// AllocateStack - Allocate a chunk of stack space with the specified size		/// AllocateStack - Allocate a chunk of stack space with the specified size
/// and alignment.		/// and alignment.
unsigned AllocateStack(unsigned Size, unsigned Align) {		unsigned AllocateStack(unsigned Size, unsigned Align) {
assert(Align && ((Align - 1) & Align) == 0); // Align is power of 2.		assert(Align && ((Align - 1) & Align) == 0); // Align is power of 2.
StackOffset = ((StackOffset + Align - 1) & ~(Align - 1));		StackOffset = ((StackOffset + Align - 1) & ~(Align - 1));
		rnkUnsubmitted Done Reply Inline Actions RoundUpToAlignment here would be a nice cleanup. rnk: RoundUpToAlignment here would be a nice cleanup.
unsigned Result = StackOffset;		unsigned Result = StackOffset;
StackOffset += Size;		StackOffset += Size;
		MinStackAlign = Align > MinStackAlign ? Align : MinStackAlign;
		rnkUnsubmitted Done Reply Inline Actions This looks like `std::max(Align, MinStackAlign)` to me. rnk: This looks like `std::max(Align, MinStackAlign)` to me.
MF.getFrameInfo()->ensureMaxAlignment(Align);		MF.getFrameInfo()->ensureMaxAlignment(Align);
return Result;		return Result;
}		}

/// Version of AllocateStack with extra register to be shadowed.		/// Version of AllocateStack with extra register to be shadowed.
unsigned AllocateStack(unsigned Size, unsigned Align, unsigned ShadowReg) {		unsigned AllocateStack(unsigned Size, unsigned Align, unsigned ShadowReg) {
MarkAllocated(ShadowReg);		MarkAllocated(ShadowReg);
return AllocateStack(Size, Align);		return AllocateStack(Size, Align);
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

lib/CodeGen/CallingConvLower.cpp

Show All 26 Lines

CCState::CCState(CallingConv::ID CC, bool isVarArg, MachineFunction &mf,		CCState::CCState(CallingConv::ID CC, bool isVarArg, MachineFunction &mf,
SmallVectorImpl<CCValAssign> &locs, LLVMContext &C)		SmallVectorImpl<CCValAssign> &locs, LLVMContext &C)
: CallingConv(CC), IsVarArg(isVarArg), MF(mf),		: CallingConv(CC), IsVarArg(isVarArg), MF(mf),
TRI(*MF.getSubtarget().getRegisterInfo()), Locs(locs), Context(C),		TRI(*MF.getSubtarget().getRegisterInfo()), Locs(locs), Context(C),
CallOrPrologue(Unknown) {		CallOrPrologue(Unknown) {
// No stack is used.		// No stack is used.
StackOffset = 0;		StackOffset = 0;
		MinStackAlign = 1;

clearByValRegsInfo();		clearByValRegsInfo();
UsedRegs.resize((TRI.getNumRegs()+31)/32);		UsedRegs.resize((TRI.getNumRegs()+31)/32);
}		}

/// Allocate space on the stack large enough to pass an argument by value.		/// Allocate space on the stack large enough to pass an argument by value.
/// The size and alignment information of the argument is encoded in		/// The size and alignment information of the argument is encoded in
/// its parameter attribute.		/// its parameter attribute.
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	static bool isValueTypeInRegForCC(CallingConv::ID CC, MVT VT) {
if (CC == CallingConv::X86_VectorCall \|\| CC == CallingConv::X86_FastCall)		if (CC == CallingConv::X86_VectorCall \|\| CC == CallingConv::X86_FastCall)
return true;		return true;
return false;		return false;
}		}

void CCState::getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs,		void CCState::getRemainingRegParmsForType(SmallVectorImpl<MCPhysReg> &Regs,
MVT VT, CCAssignFn Fn) {		MVT VT, CCAssignFn Fn) {
unsigned SavedStackOffset = StackOffset;		unsigned SavedStackOffset = StackOffset;
		unsigned SavedMinStackAlign = MinStackAlign;
unsigned NumLocs = Locs.size();		unsigned NumLocs = Locs.size();

// Set the 'inreg' flag if it is used for this calling convention.		// Set the 'inreg' flag if it is used for this calling convention.
ISD::ArgFlagsTy Flags;		ISD::ArgFlagsTy Flags;
if (isValueTypeInRegForCC(CallingConv, VT))		if (isValueTypeInRegForCC(CallingConv, VT))
Flags.setInReg();		Flags.setInReg();

// Allocate something of this value type repeatedly until we get assigned a		// Allocate something of this value type repeatedly until we get assigned a
Show All 15 Lines	#endif
for (unsigned I = NumLocs, E = Locs.size(); I != E; ++I)		for (unsigned I = NumLocs, E = Locs.size(); I != E; ++I)
if (Locs[I].isRegLoc())		if (Locs[I].isRegLoc())
Regs.push_back(MCPhysReg(Locs[I].getLocReg()));		Regs.push_back(MCPhysReg(Locs[I].getLocReg()));

// Clear the assigned values and stack memory. We leave the registers marked		// Clear the assigned values and stack memory. We leave the registers marked
// as allocated so that future queries don't return the same registers, i.e.		// as allocated so that future queries don't return the same registers, i.e.
// when i64 and f64 are both passed in GPRs.		// when i64 and f64 are both passed in GPRs.
StackOffset = SavedStackOffset;		StackOffset = SavedStackOffset;
		MinStackAlign = SavedMinStackAlign;
Locs.resize(NumLocs);		Locs.resize(NumLocs);
}		}

void CCState::analyzeMustTailForwardedRegisters(		void CCState::analyzeMustTailForwardedRegisters(
SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,		SmallVectorImpl<ForwardedRegister> &Forwards, ArrayRef<MVT> RegParmTypes,
CCAssignFn Fn) {		CCAssignFn Fn) {
// Oftentimes calling conventions will not user register parameters for		// Oftentimes calling conventions will not user register parameters for
// variadic functions, so we need to assume we're not variadic so that we get		// variadic functions, so we need to assume we're not variadic so that we get
Show All 14 Lines

test/CodeGen/X86/aligned-variadic.ll

Show All 9 Lines	entry:
%va = alloca [1 x %struct.__va_list_tag], align 16		%va = alloca [1 x %struct.__va_list_tag], align 16
%arraydecay = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0		%arraydecay = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0
%arraydecay1 = bitcast [1 x %struct.__va_list_tag]* %va to i8*		%arraydecay1 = bitcast [1 x %struct.__va_list_tag]* %va to i8*
call void @llvm.va_start(i8* %arraydecay1)		call void @llvm.va_start(i8* %arraydecay1)
%overflow_arg_area_p = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0, i32 2		%overflow_arg_area_p = getelementptr inbounds [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0, i32 2
%overflow_arg_area = load i8, i8* %overflow_arg_area_p, align 8		%overflow_arg_area = load i8, i8* %overflow_arg_area_p, align 8
%overflow_arg_area.next = getelementptr i8, i8* %overflow_arg_area, i64 24		%overflow_arg_area.next = getelementptr i8, i8* %overflow_arg_area, i64 24
store i8* %overflow_arg_area.next, i8** %overflow_arg_area_p, align 8		store i8* %overflow_arg_area.next, i8** %overflow_arg_area_p, align 8
; X32: leal 68(%esp), [[REG:%.*]]		; X32: leal 72(%esp), [[REG:%.*]]
; X32: movl [[REG]], 16(%esp)		; X32: movl [[REG]], 16(%esp)
; X64: leaq 232(%rsp), [[REG:%.*]]		; X64: leaq 232(%rsp), [[REG:%.*]]
; X64: movq [[REG]], 184(%rsp)		; X64: movq [[REG]], 184(%rsp)
; X64: leaq 176(%rsp), %rdi		; X64: leaq 176(%rsp), %rdi
call void @qux(%struct.__va_list_tag* %arraydecay)		call void @qux(%struct.__va_list_tag* %arraydecay)
ret void		ret void
}		}

; Function Attrs: nounwind		; Function Attrs: nounwind
declare void @llvm.va_start(i8*)		declare void @llvm.va_start(i8*)

declare void @qux(%struct.__va_list_tag*)		declare void @qux(%struct.__va_list_tag*)

test/CodeGen/X86/win32-spill-xmm.ll

				; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s \| FileCheck %s
				; CHECK: subl $32, %esp
				; CHECK: movaps %xmm3, (%esp)
				; CHECK: movl $0, 16(%esp)

				declare void @bar(<16 x float> %a, i32 %b) nounwind

				define void @foo(i32, <16 x float> * nocapture readonly) nounwind {
				entry:
				rnkUnsubmitted Not Done Reply Inline Actions ; CHECK: calll bar rnk: ; CHECK: calll bar
				%2 = alloca i32, i32 %0
				%3 = load <16 x float>, <16 x float> * %1, align 64
				tail call void @bar(<16 x float> %3, i32 0) nounwind
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Ensure stack is properly aligned for call argument initialization
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 33126

include/llvm/CodeGen/CallingConvLower.h

lib/CodeGen/CallingConvLower.cpp

test/CodeGen/X86/aligned-variadic.ll

test/CodeGen/X86/win32-spill-xmm.ll

This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Ensure stack is properly aligned for call argument initializationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 33126

include/llvm/CodeGen/CallingConvLower.h

lib/CodeGen/CallingConvLower.cpp

test/CodeGen/X86/aligned-variadic.ll

test/CodeGen/X86/win32-spill-xmm.ll

[Codegen] Ensure stack is properly aligned for call argument initialization
ClosedPublic