This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
5/9
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2/4
vaargs-win32.ll
-
win32-spill-xmm.ll

Differential D114536

[X86][MS] Fix the wrong alignment of vector variable arguments on Win32
ClosedPublic

Authored by pengfei on Nov 24 2021, 7:06 AM.

Download Raw Diff

Details

Reviewers

rnk
LiuChen3
LuoYuanke

Commits

rG2aa732a9183b: [X86][MS] Fix the wrong alignment of vector variable arguments on Win32

Summary

D108887 fixed alignment mismatch by changing the caller's alignment in
ABI. However, we found some cases that still assume the alignment is
vector size. This patch fixes them to avoid the runtime crash.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pengfei created this revision.Nov 24 2021, 7:06 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptNov 24 2021, 7:06 AM

pengfei requested review of this revision.Nov 24 2021, 7:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2021, 7:06 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B135839: Diff 389489.Nov 24 2021, 7:51 AM

rnk added inline comments.Nov 24 2021, 1:50 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
3475	I would prefer to see if we can avoid changing the prototype here. The target checks (is win32) can be calculated internally by looking at the subtarget. The only thing that varies per call site is the `IsVarArg` part. If we have to change the prototype, please just pass `IsVarArg`.
4092	Ditto. In this case, we are accumulating consecutive boolean parameters which can reduce readability as well. An alternative solution would be nicer.
4101	Surely there are other kinds of parameters that are clamped to 4 byte alignment on win32. How are doubles handled? Can we handle them the same way?
llvm/test/CodeGen/X86/vaargs-win32.ll
41	As I understand it, clang will not generate this IR. It will either mark the vector with `inreg`, or it will pass it indirectly (`<4 x float>*`).
97	I would like to see a test case where we set up a call that has intentionally misaligned parameters, so a C prototype that looks like `void f(v4f, int, v4f, ...)`. This really underscores the need to use `movups`, because the ABI requires the data to be unaligned.

Address @rnk 's comments. Thanks for the review!

llvm/lib/Target/X86/X86ISelLowering.cpp
4101	I think `double` might be aligned to 8 too. But I think ignore `double` handling should be ok. There's no difference with alignment equals to 4 and 8, because we don't have aligned instructions for it.
llvm/test/CodeGen/X86/vaargs-win32.ll
41	Yeah, it did have `inreg` when generated. I removed it because I thought it's not precise here. Adding it back.
97	We have a similar one above, see `testPastArguments`. It checks `(int, v4f)`. I think it should be ok. OTOH, we can also check whether or not to use `movups` by checking the stack realignment code `andl $-16, %esp`.

Harbormaster completed remote builds in B136061: Diff 389776.Nov 25 2021, 7:46 AM

rnk added inline comments.Nov 30 2021, 11:41 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
4101	I think it's important to make sure we have the right alignment values for all types, regardless of whether they have aligned instructions or not. LLVM uses alignment aggressively, so we need to be precise everywhere. I'd still like to see if we can make this condition less target-specific. What's special about win32 in this case is that we only have 4 byte stack alignment. There are other platforms where this is true as well: i686-darwin for example. What is the effect of passing `Subtarget.getFrameLowering()->getStackAlign()` in place of the MaybeAlign parameter?Presumably it would cause some test failures, but maybe we actually want that behavior. If we do this, do we actually need the IsVarArg parameter at all? To me, it seems unlikely that whether a prototype has varargs or not should affect the way that prototyped arguments are passed. I believe such vectors passed in memory should be passed indirectly.

pengfei added inline comments.Dec 9 2021, 5:23 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
4101	Sorry for the late reply. What's special about win32 in this case is that we only have 4 byte stack alignment. That's true, but some variables have their own alignment. See the example in f2. https://godbolt.org/z/xqcvj4YoK I have this impression since I met the problem previously, but I'm not sure for other types. Seems double is still aligned to 4 byte. There are other platforms where this is true as well: i686-darwin for example. Seems not. At least `f80` is aligned to 16 bytes. I just fixed the issue in D113739. do we actually need the IsVarArg parameter at all? Yes, because the same vector has different alignment between variant and fixed argument.

rnk added inline comments.Dec 13 2021, 4:48 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
4101	Yes, LLVM will realign the stack to store values with high required alignment. I'm saying that, this code, which stores an argument to stack memory, should maybe clamp its alignment assumption to the ABI's stack alignment, which on Windows, happens to be 4. On most other platforms, it will be 16. That seems equivalent to your logic, and more general. I'm asking if this suggestion causes problems in practice. Maybe it causes widespread test failures, I can't say for sure. In any case, I'd like to see a more principled solution. Yes, because the same vector has different alignment between variant and fixed argument. I think what I'm trying to get at is that, in these two prototypes, `v` should be passed identically: void f1(int x, v4f v, int y); void f2(int x, v4f v, int y, ...); The IsVarArg boolean affects the entire call site. Adding and removing an ellipsis to the prototype should not change the instructions used to store the prototyped arguments, they should remain the same, and use the regular alignment assumptions, right?

Sorry, I made a mistake when I wanted to demonstrate the difference between variant and fixed arguments. Yes, you are right. The alignment I showed in f2 is the store of variable instead of ABI's.
A summary from my latest investigation:

For fixed function arguments:
- LLVM will pass the first 3 vector variables by register: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L804
- LLVM will pass the following vector variables by value with alignment = 4: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L788
- Clang FE will emit the address of value instead of the value itself. So we don't have chance to handle the alignment. https://godbolt.org/z/KvvY4hMda
- This is matching what MSVC's doing.
For variant arguments:
- MSVC allows 3 vector variables at max no matter whether the variables are in the left of comma or in ellipsis in the prototype: https://godbolt.org/z/sYbcheEjv
- MSVC always use stack to pass vector variables. The alignment for the vector variables is 4.

I think what I'm trying to get at is that, in these two prototypes, v should be passed identically:

No, they are not. On fixed arguments function, vector variables are passed by registers or address. While on variant function, vector variables are limited to 3 and passed by stack.

I'm saying that, this code, which stores an argument to stack memory, should maybe clamp its alignment assumption to the ABI's stack alignment, which on Windows, happens to be 4. On most other platforms, it will be 16.

We have specified each of the type's alignment in the calling conversion when they are passed by stack: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L841
You can find on other 32 bit platforms, the type alignments are almost 4 too. The only exception is f80. Anyway, we don't need to warry about the fixed arguments here.

The IsVarArg boolean affects the entire call site.

Yes, because all vector variables are passed on stack with alignment = 4. No matter they are in the left of comma or right. It is intended to affects the entire call site. .

Adding and removing an ellipsis to the prototype should not change the instructions used to store the prototyped arguments, they should remain the same, and use the regular alignment assumptions, right?

No. Adding and removing an ellipsis are totally different ways when passing arguments. We need to use IsVarArg to distinguish each other.

In D114536#3191746, @pengfei wrote:

Sorry, I made a mistake when I wanted to demonstrate the difference between variant and fixed arguments. Yes, you are right. The alignment I showed in f2 is the store of variable instead of ABI's.
A summary from my latest investigation:

For fixed function arguments:

LLVM will pass the first 3 vector variables by register: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L804

LLVM will pass the following vector variables by value with alignment = 4: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L788

Clang FE will emit the address of value instead of the value itself. So we don't have chance to handle the alignment. https://godbolt.org/z/KvvY4hMda

This is matching what MSVC's doing.

For variant arguments:

MSVC allows 3 vector variables at max no matter whether the variables are in the left of comma or in ellipsis in the prototype: https://godbolt.org/z/sYbcheEjv

MSVC always use stack to pass vector variables. The alignment for the vector variables is 4.

I think what I'm trying to get at is that, in these two prototypes, v should be passed identically:

No, they are not. On fixed arguments function, vector variables are passed by registers or address. While on variant function, vector variables are limited to 3 and passed by stack.

I'm saying that, this code, which stores an argument to stack memory, should maybe clamp its alignment assumption to the ABI's stack alignment, which on Windows, happens to be 4. On most other platforms, it will be 16.

We have specified each of the type's alignment in the calling conversion when they are passed by stack: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L841
You can find on other 32 bit platforms, the type alignments are almost 4 too. The only exception is f80. Anyway, we don't need to warry about the fixed arguments here.

The IsVarArg boolean affects the entire call site.

Yes, because all vector variables are passed on stack with alignment = 4. No matter they are in the left of comma or right. It is intended to affects the entire call site. .

Adding and removing an ellipsis to the prototype should not change the instructions used to store the prototyped arguments, they should remain the same, and use the regular alignment assumptions, right?

No. Adding and removing an ellipsis are totally different ways when passing arguments. We need to use IsVarArg to distinguish each other.

Thanks for the info, sorry about the delayed response. I see that adding the ellipsis drastically changes the way vectors are passed, but I think that complexity lives in the frontend. If the function is vararg, the frontend (Clang or other) will pass the vector directly. If it has a fixed prototype, the vector should be passed by address after passing three vectors.

There are no cases when a vector passed on the stack is aligned to 16 bytes, it should always be four byte aligned. Therefore, I don't think we need the IsVarArg boolean, we can go ahead and clamp the alignment on these argument loads.

Remove IsVarArg boolean.

There are no cases when a vector passed on the stack is aligned to 16 bytes, it should always be four byte aligned.

Unfortunately, there is. See changes in win32-spill-xmm.ll.

The good news is it doesn't look like a reasonable test. See https://godbolt.org/z/hdsPTsbPW
MSVC doesn't pass a 512 bits argument in that way while Clang FE passes it by pointer.
It seems D12337 isn't a valid patch. So we can ignore the change in win32-spill-xmm.ll.
We may need to consider to fix the imcompatible issue between Clang and MSVC, but it is not related to this patch anyway.

Herald added a subscriber: qcolombet. · View Herald TranscriptJan 16 2022, 7:52 AM

Harbormaster completed remote builds in B143677: Diff 400382.Jan 16 2022, 8:48 AM

lgtm

Sorry for the delay, I was out sick, and this fell out of my inbox.

llvm/lib/Target/X86/X86ISelLowering.cpp
4100	I think this could be simplified to use getStackAlign, but I won't insist.

This revision is now accepted and ready to land.Feb 2 2022, 10:22 AM

In D114536#3291195, @rnk wrote:

lgtm

Sorry for the delay, I was out sick, and this fell out of my inbox.

Thanks for the review. It's all right. Take care!

llvm/lib/Target/X86/X86ISelLowering.cpp
4100	I don't think so. The alignment are inconstant with different types on different OSs. Take CC_X86_64_C for example, https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L592-L603 f80/f128 are determined by layout and vector types are aligned to their size.

This revision was landed with ongoing or failed builds.Feb 12 2022, 6:23 PM

Closed by commit rG2aa732a9183b: [X86][MS] Fix the wrong alignment of vector variable arguments on Win32 (authored by pengfei). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rG2aa732a9183b: [X86][MS] Fix the wrong alignment of vector variable arguments on Win32.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

14 lines

test/

CodeGen/

X86/

vaargs-win32.ll

8 lines

win32-spill-xmm.ll

2 lines

Diff 408227

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,466 Lines • ▼ Show 20 Lines
X86TargetLowering::LowerMemArgument(SDValue Chain, CallingConv::ID CallConv,		X86TargetLowering::LowerMemArgument(SDValue Chain, CallingConv::ID CallConv,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
const CCValAssign &VA,		const CCValAssign &VA,
MachineFrameInfo &MFI, unsigned i) const {		MachineFrameInfo &MFI, unsigned i) const {
// Create the nodes corresponding to a load from this parameter slot.		// Create the nodes corresponding to a load from this parameter slot.
ISD::ArgFlagsTy Flags = Ins[i].Flags;		ISD::ArgFlagsTy Flags = Ins[i].Flags;
bool AlwaysUseMutable = shouldGuaranteeTCO(		bool AlwaysUseMutable = shouldGuaranteeTCO(
CallConv, DAG.getTarget().Options.GuaranteedTailCallOpt);		CallConv, DAG.getTarget().Options.GuaranteedTailCallOpt);
		rnkUnsubmitted Done Reply Inline Actions I would prefer to see if we can avoid changing the prototype here. The target checks (is win32) can be calculated internally by looking at the subtarget. The only thing that varies per call site is the `IsVarArg` part. If we have to change the prototype, please just pass `IsVarArg`. rnk: I would prefer to see if we can avoid changing the prototype here. The target checks (is win32)…
bool isImmutable = !AlwaysUseMutable && !Flags.isByVal();		bool isImmutable = !AlwaysUseMutable && !Flags.isByVal();
EVT ValVT;		EVT ValVT;
MVT PtrVT = getPointerTy(DAG.getDataLayout());		MVT PtrVT = getPointerTy(DAG.getDataLayout());

// If value is passed by pointer we have address passed instead of the value		// If value is passed by pointer we have address passed instead of the value
// itself. No need to extend if the mask value and location share the same		// itself. No need to extend if the mask value and location share the same
// absolute size.		// absolute size.
bool ExtendedInMem =		bool ExtendedInMem =
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerMemArgument(SDValue Chain, CallingConv::ID CallConv,

// Set SExt or ZExt flag.		// Set SExt or ZExt flag.
if (VA.getLocInfo() == CCValAssign::ZExt) {		if (VA.getLocInfo() == CCValAssign::ZExt) {
MFI.setObjectZExt(FI, true);		MFI.setObjectZExt(FI, true);
} else if (VA.getLocInfo() == CCValAssign::SExt) {		} else if (VA.getLocInfo() == CCValAssign::SExt) {
MFI.setObjectSExt(FI, true);		MFI.setObjectSExt(FI, true);
}		}

		MaybeAlign Alignment;
		if (Subtarget.isTargetWindowsMSVC() && !Subtarget.is64Bit() &&
		ValVT != MVT::f80)
		Alignment = MaybeAlign(4);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);		SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
SDValue Val = DAG.getLoad(		SDValue Val = DAG.getLoad(
ValVT, dl, Chain, FIN,		ValVT, dl, Chain, FIN,
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI));		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI),
		Alignment);
return ExtendedInMem		return ExtendedInMem
? (VA.getValVT().isVector()		? (VA.getValVT().isVector()
? DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VA.getValVT(), Val)		? DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VA.getValVT(), Val)
: DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val))		: DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val))
: Val;		: Val;
}		}

// FIXME: Get this from tablegen.		// FIXME: Get this from tablegen.
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines
}		}

SDValue X86TargetLowering::LowerMemOpCallTo(SDValue Chain, SDValue StackPtr,		SDValue X86TargetLowering::LowerMemOpCallTo(SDValue Chain, SDValue StackPtr,
SDValue Arg, const SDLoc &dl,		SDValue Arg, const SDLoc &dl,
SelectionDAG &DAG,		SelectionDAG &DAG,
const CCValAssign &VA,		const CCValAssign &VA,
ISD::ArgFlagsTy Flags,		ISD::ArgFlagsTy Flags,
bool isByVal) const {		bool isByVal) const {
unsigned LocMemOffset = VA.getLocMemOffset();		unsigned LocMemOffset = VA.getLocMemOffset();
		rnkUnsubmitted Done Reply Inline Actions Ditto. In this case, we are accumulating consecutive boolean parameters which can reduce readability as well. An alternative solution would be nicer. rnk: Ditto. In this case, we are accumulating consecutive boolean parameters which can reduce…
SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl);		SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl);
PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()),		PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(DAG.getDataLayout()),
StackPtr, PtrOff);		StackPtr, PtrOff);
if (isByVal)		if (isByVal)
return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl);		return CreateCopyOfByValArgument(Arg, PtrOff, Chain, Flags, DAG, dl);

		MaybeAlign Alignment;
		if (Subtarget.isTargetWindowsMSVC() && !Subtarget.is64Bit() &&
		rnkUnsubmitted Not Done Reply Inline Actions I think this could be simplified to use getStackAlign, but I won't insist. rnk: I think this could be simplified to use getStackAlign, but I won't insist.
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I don't think so. The alignment are inconstant with different types on different OSs. Take CC_X86_64_C for example, https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td#L592-L603 f80/f128 are determined by layout and vector types are aligned to their size. pengfei: I don't think so. The alignment are inconstant with different types on different OSs. Take…
		Arg.getSimpleValueType() != MVT::f80)
		rnkUnsubmitted Not Done Reply Inline Actions Surely there are other kinds of parameters that are clamped to 4 byte alignment on win32. How are doubles handled? Can we handle them the same way? rnk: Surely there are other kinds of parameters that are clamped to 4 byte alignment on win32. How…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I think `double` might be aligned to 8 too. But I think ignore `double` handling should be ok. There's no difference with alignment equals to 4 and 8, because we don't have aligned instructions for it. pengfei: I think `double` might be aligned to 8 too. But I think ignore `double` handling should be ok.
		rnkUnsubmitted Not Done Reply Inline Actions I think it's important to make sure we have the right alignment values for all types, regardless of whether they have aligned instructions or not. LLVM uses alignment aggressively, so we need to be precise everywhere. I'd still like to see if we can make this condition less target-specific. What's special about win32 in this case is that we only have 4 byte stack alignment. There are other platforms where this is true as well: i686-darwin for example. What is the effect of passing `Subtarget.getFrameLowering()->getStackAlign()` in place of the MaybeAlign parameter?Presumably it would cause some test failures, but maybe we actually want that behavior. If we do this, do we actually need the IsVarArg parameter at all? To me, it seems unlikely that whether a prototype has varargs or not should affect the way that prototyped arguments are passed. I believe such vectors passed in memory should be passed indirectly. rnk: I think it's important to make sure we have the right alignment values for all types…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Sorry for the late reply. What's special about win32 in this case is that we only have 4 byte stack alignment. That's true, but some variables have their own alignment. See the example in f2. https://godbolt.org/z/xqcvj4YoK I have this impression since I met the problem previously, but I'm not sure for other types. Seems double is still aligned to 4 byte. There are other platforms where this is true as well: i686-darwin for example. Seems not. At least `f80` is aligned to 16 bytes. I just fixed the issue in D113739. do we actually need the IsVarArg parameter at all? Yes, because the same vector has different alignment between variant and fixed argument. pengfei: Sorry for the late reply. > What's special about win32 in this case is that we only have 4…
		rnkUnsubmitted Not Done Reply Inline Actions Yes, LLVM will realign the stack to store values with high required alignment. I'm saying that, this code, which stores an argument to stack memory, should maybe clamp its alignment assumption to the ABI's stack alignment, which on Windows, happens to be 4. On most other platforms, it will be 16. That seems equivalent to your logic, and more general. I'm asking if this suggestion causes problems in practice. Maybe it causes widespread test failures, I can't say for sure. In any case, I'd like to see a more principled solution. Yes, because the same vector has different alignment between variant and fixed argument. I think what I'm trying to get at is that, in these two prototypes, `v` should be passed identically: void f1(int x, v4f v, int y); void f2(int x, v4f v, int y, ...); The IsVarArg boolean affects the entire call site. Adding and removing an ellipsis to the prototype should not change the instructions used to store the prototyped arguments, they should remain the same, and use the regular alignment assumptions, right? rnk: Yes, LLVM will realign the stack to store values with high required alignment. I'm saying that…
		Alignment = MaybeAlign(4);
return DAG.getStore(		return DAG.getStore(
Chain, dl, Arg, PtrOff,		Chain, dl, Arg, PtrOff,
MachinePointerInfo::getStack(DAG.getMachineFunction(), LocMemOffset));		MachinePointerInfo::getStack(DAG.getMachineFunction(), LocMemOffset),
		Alignment);
}		}

/// Emit a load of return address if tail call		/// Emit a load of return address if tail call
/// optimization is performed and it is required.		/// optimization is performed and it is required.
SDValue X86TargetLowering::EmitTailCallLoadRetAddr(		SDValue X86TargetLowering::EmitTailCallLoadRetAddr(
SelectionDAG &DAG, SDValue &OutRetAddr, SDValue Chain, bool IsTailCall,		SelectionDAG &DAG, SDValue &OutRetAddr, SDValue Chain, bool IsTailCall,
bool Is64Bit, int FPDiff, const SDLoc &dl) const {		bool Is64Bit, int FPDiff, const SDLoc &dl) const {
// Adjust the Return address stack slot.		// Adjust the Return address stack slot.
▲ Show 20 Lines • Show All 51,292 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vaargs-win32.ll

Show All 28 Lines
; MINGW-NEXT: popl %ebp		; MINGW-NEXT: popl %ebp
; MINGW-NEXT: retl		; MINGW-NEXT: retl
entry:		entry:
%0 = load <4 x float>, <4 x float>* @a, align 16		%0 = load <4 x float>, <4 x float>* @a, align 16
%call = tail call i32 (i32, ...) @testm128(i32 1, <4 x float> inreg %0)		%call = tail call i32 (i32, ...) @testm128(i32 1, <4 x float> inreg %0)
ret void		ret void
}		}

define <4 x i32> @foo(<4 x float> %0, ...) nounwind {		define <4 x i32> @foo(<4 x float> inreg %0, ...) nounwind {
; MSVC-LABEL: foo:		; MSVC-LABEL: foo:
; MSVC: # %bb.0:		; MSVC: # %bb.0:
; MSVC-NEXT: pushl %eax		; MSVC-NEXT: pushl %eax
; MSVC-NEXT: movaps 8(%esp), %xmm0		; MSVC-NEXT: movups 8(%esp), %xmm0
		rnkUnsubmitted Not Done Reply Inline Actions As I understand it, clang will not generate this IR. It will either mark the vector with `inreg`, or it will pass it indirectly (`<4 x float>`). rnk:* As I understand it, clang will not generate this IR. It will either mark the vector with…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Yeah, it did have `inreg` when generated. I removed it because I thought it's not precise here. Adding it back. pengfei: Yeah, it did have `inreg` when generated. I removed it because I thought it's not precise here.
; MSVC-NEXT: movups 24(%esp), %xmm1		; MSVC-NEXT: movups 24(%esp), %xmm1
; MSVC-NEXT: cmpltps %xmm1, %xmm0		; MSVC-NEXT: cmpltps %xmm1, %xmm0
; MSVC-NEXT: popl %eax		; MSVC-NEXT: popl %eax
; MSVC-NEXT: retl		; MSVC-NEXT: retl
;		;
; MINGW-LABEL: foo:		; MINGW-LABEL: foo:
; MINGW: # %bb.0:		; MINGW: # %bb.0:
; MINGW-NEXT: pushl %ebp		; MINGW-NEXT: pushl %ebp
Show All 18 Lines	; MINGW-NEXT: retl
ret <4 x i32> %7		ret <4 x i32> %7
}		}

define <4 x i32> @bar() nounwind {		define <4 x i32> @bar() nounwind {
; MSVC-LABEL: bar:		; MSVC-LABEL: bar:
; MSVC: # %bb.0:		; MSVC: # %bb.0:
; MSVC-NEXT: subl $32, %esp		; MSVC-NEXT: subl $32, %esp
; MSVC-NEXT: movaps {{.*#+}} xmm0 = [5.0E+0,6.0E+0,7.0E+0,8.0E+0]		; MSVC-NEXT: movaps {{.*#+}} xmm0 = [5.0E+0,6.0E+0,7.0E+0,8.0E+0]
; MSVC-NEXT: movaps %xmm0, 16(%esp)		; MSVC-NEXT: movups %xmm0, 16(%esp)
; MSVC-NEXT: movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]		; MSVC-NEXT: movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
; MSVC-NEXT: movaps %xmm0, (%esp)		; MSVC-NEXT: movups %xmm0, (%esp)
; MSVC-NEXT: calll _foo		; MSVC-NEXT: calll _foo
; MSVC-NEXT: addl $32, %esp		; MSVC-NEXT: addl $32, %esp
; MSVC-NEXT: retl		; MSVC-NEXT: retl
;		;
; MINGW-LABEL: bar:		; MINGW-LABEL: bar:
; MINGW: # %bb.0:		; MINGW: # %bb.0:
; MINGW-NEXT: pushl %ebp		; MINGW-NEXT: pushl %ebp
; MINGW-NEXT: movl %esp, %ebp		; MINGW-NEXT: movl %esp, %ebp
; MINGW-NEXT: andl $-16, %esp		; MINGW-NEXT: andl $-16, %esp
; MINGW-NEXT: subl $48, %esp		; MINGW-NEXT: subl $48, %esp
; MINGW-NEXT: movaps {{.*#+}} xmm0 = [5.0E+0,6.0E+0,7.0E+0,8.0E+0]		; MINGW-NEXT: movaps {{.*#+}} xmm0 = [5.0E+0,6.0E+0,7.0E+0,8.0E+0]
; MINGW-NEXT: movaps %xmm0, 16(%esp)		; MINGW-NEXT: movaps %xmm0, 16(%esp)
; MINGW-NEXT: movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]		; MINGW-NEXT: movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
; MINGW-NEXT: movaps %xmm0, (%esp)		; MINGW-NEXT: movaps %xmm0, (%esp)
; MINGW-NEXT: calll _foo		; MINGW-NEXT: calll _foo
; MINGW-NEXT: movl %ebp, %esp		; MINGW-NEXT: movl %ebp, %esp
; MINGW-NEXT: popl %ebp		; MINGW-NEXT: popl %ebp
; MINGW-NEXT: retl		; MINGW-NEXT: retl
%1 = tail call <4 x i32> (<4 x float>, ...) @foo(<4 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>, <4 x float> <float 5.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>)		%1 = tail call <4 x i32> (<4 x float>, ...) @foo(<4 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>, <4 x float> <float 5.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>)
		rnkUnsubmitted Not Done Reply Inline Actions I would like to see a test case where we set up a call that has intentionally misaligned parameters, so a C prototype that looks like `void f(v4f, int, v4f, ...)`. This really underscores the need to use `movups`, because the ABI requires the data to be unaligned. rnk: I would like to see a test case where we set up a call that has intentionally misaligned…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions We have a similar one above, see `testPastArguments`. It checks `(int, v4f)`. I think it should be ok. OTOH, we can also check whether or not to use `movups` by checking the stack realignment code `andl $-16, %esp`. pengfei: We have a similar one above, see `testPastArguments`. It checks `(int, v4f)`. I think it should…
ret <4 x i32> %1		ret <4 x i32> %1
}		}

declare i32 @testm128(i32, ...) nounwind		declare i32 @testm128(i32, ...) nounwind
declare void @llvm.va_start(i8*)		declare void @llvm.va_start(i8*)
declare void @llvm.lifetime.start.p0i8(i64, i8*)		declare void @llvm.lifetime.start.p0i8(i64, i8*)
declare void @llvm.lifetime.end.p0i8(i64, i8*)		declare void @llvm.lifetime.end.p0i8(i64, i8*)

llvm/test/CodeGen/X86/win32-spill-xmm.ll

	; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s \| FileCheck %s			; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s \| FileCheck %s

	; Check proper alignment of spilled vector			; Check proper alignment of spilled vector

	; CHECK-LABEL: spill_ok			; CHECK-LABEL: spill_ok
	; CHECK: subl $32, %esp			; CHECK: subl $32, %esp
	; CHECK: movaps %xmm3, (%esp)			; CHECK: movups %xmm3, (%esp)
	; CHECK: movl $0, 16(%esp)			; CHECK: movl $0, 16(%esp)
	; CHECK: calll _bar			; CHECK: calll _bar
	define void @spill_ok(i32, <16 x float> *) {			define void @spill_ok(i32, <16 x float> *) {
	entry:			entry:
	%2 = alloca i32, i32 %0			%2 = alloca i32, i32 %0
	%3 = load <16 x float>, <16 x float> * %1, align 64			%3 = load <16 x float>, <16 x float> * %1, align 64
	tail call void @bar(<16 x float> %3, i32 0) nounwind			tail call void @bar(<16 x float> %3, i32 0) nounwind
	ret void			ret void
	Show All 25 Lines