This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/CodeGen/
-
clang/
-
CodeGen/
3/6
CGFunctionInfo.h
-
lib/CodeGen/
-
CodeGen/
-
CGCall.cpp
-
TargetInfo.cpp
-
test/
-
CodeGen/
3/11
x86_32-arguments-win32.c
-
CodeGenCXX/
-
inalloca-overaligned.cpp
1/2
inalloca-vector.cpp

Differential D72114

[MS] Overhaul how clang passes overaligned args on x86_32
ClosedPublic

Authored by rnk on Jan 2 2020, 3:25 PM.

Download Raw Diff

Details

Reviewers

rjmccall
craig.topper
erichkeane

Commits

rG2af74e27ed7d: [MS] Overhaul how clang passes overaligned args on x86_32

Summary

MSVC 2013 would refuse to pass highly aligned things (typically vectors
and aggregates) by value. Users would receive this error:

t.cpp(11) : error C2719: 'w': formal parameter with __declspec(align('32')) won't be aligned
t.cpp(11) : error C2719: 'q': formal parameter with __declspec(align('32')) won't be aligned

However, in MSVC 2015, this behavior was changed, and highly aligned
things are now passed indirectly. To avoid breaking backwards
incompatibility, objects that do not have a *required* high alignment
(i.e. double) are still passed directly, even though they are not
naturally aligned. This change implements the new behavior of passing
things indirectly.

The new behavior is:

up to three vector parameters can be passed in [XYZ]MM0-2
remaining arguments with required alignment greater than 4 bytes are passed indirectly

Previously, MSVC never passed things truly indirectly, meaning clang
would always apply the byval attribute to indirect arguments. We had to
go to the trouble of adding inalloca so that non-trivially copyable C++
types could be passed in place without copying the object
representation. When inalloca was added, we asserted that all arguments
passed indirectly must use byval. With this change, that assert no
longer holds, and I had to update inalloca to handle that case. The
implicit sret pointer parameter was already handled this way, and this
change generalizes some of that logic to arguments.

There are two cases that this change leaves unfixed:

objects that are non-trivially copyable *and* overaligned
vectorcall + inalloca + vectors

For case 1, I need to touch C++ ABI code in MicrosoftCXXABI.cpp, so I
want to do it in a follow-up.

For case 2, my fix is one line, but it will require updating IR tests to
use lots of inreg, so I wanted to separate it out.

Related to D71915 and D72110

Fixes most of PR44395

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rnk created this revision.Jan 2 2020, 3:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2020, 3:25 PM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B43202: Diff 235959!Jan 2 2020, 3:36 PM

I think this is alright, I want @ctopper to take a look before I approve it though. Additionally, do you know if this modifies the regcall calling convention at all? Should it?

clang/test/CodeGenCXX/inalloca-vector.cpp
72	Are all the checks hre on out disabled for a reason?

craig.topper added inline comments.Jan 3 2020, 10:55 AM

clang/test/CodeGen/x86_32-arguments-win32.c
77	What happens in the backend with inreg if 512-bit vectors aren't legal?

rjmccall added inline comments.Jan 3 2020, 11:39 AM

clang/include/clang/CodeGen/CGFunctionInfo.h
91	Would it be better to handle `inalloca` differently, maybe as a flag rather than as a top-level kind? I'm concerned about gradually duplicating a significant amount of the expressivity of other kinds.

rnk marked 2 inline comments as done.Jan 3 2020, 11:54 AM

rnk added inline comments.

clang/test/CodeGen/x86_32-arguments-win32.c
77	LLVM splits the vector up using the largest legal vector size. As many pieces as possible are passed in available XMM/YMM registers, and the rest are passed in memory. MSVC, of course, assumes the user wanted the larger vector size, and uses whatever vector instructions it needs to move the arguments around. Previously, I advocated for a model where calling an Intel intrinsic function had the effect of implicitly marking the caller with the target attributes of the intrinsic. This falls down if the user tries to write a single function that conditionally branches between code that uses different instruction set extensions. You can imagine the SSE2 codepath accidentally using AVX instructions because the compiler thinks they are better. I'm told that ICC models CPU micro-architectural features in the CFG, but I don't ever expect that LLVM will do that. If we're stuck with per-function CPU feature settings, it seems like it would be nice to try to do what the user asked by default, and warn the user if we see them doing a cpuid check in a function that has been implicitly blessed with some target attributes. You could imagine doing a similar thing when large vector type variables are used: if a large vector argument or local is used, implicitly enable the appropriate target features to move vectors of that size around. This idea didn't get anywhere, and the current situation has persisted. You know, maybe we should just keep clang the way it is, and just set up a warning in the backend that says "hey, I split your large vector. You probably didn't want that." And then we just continue doing what we do now. Nobody likes backend warnings, but it seems better than the current direction of the frontend knowing every detail of x86 vector extensions.
clang/test/CodeGenCXX/inalloca-vector.cpp
72	Yes, this is case 2 in the commit message. I won't close the bug without coming back to this.

rjmccall added inline comments.Jan 3 2020, 12:21 PM

clang/test/CodeGen/x86_32-arguments-win32.c
77	If target attributes affect ABI, it seems really dangerous to implicitly set attributes based on what intrinsics are called. The local CPU-testing problem seems similar to the problems with local `#pragma STDC FENV_ACCESS` blocks that the constrained-FP people are looking into. They both have a "this operation is normally fully optimizable, but we might need to be more careful in specific functions" aspect to them. I wonder if there's a reasonable way to unify the approaches, or at least benefit from lessons learned.

rnk marked 2 inline comments as done.Jan 3 2020, 2:44 PM

rnk added inline comments.

clang/include/clang/CodeGen/CGFunctionInfo.h
91	In the past, I've drafted a more than one unfinished designs for how we could remodel inalloca with tokens so that it can be per-argument instead of something that applies to all argument memory. Unfortunately, I never found the time to finish or implement one. As I was working on this patch, I was thinking to myself that this could be the moment to implement one of those designs, but it would be pretty involved. Part of the issue is that, personally, I have very little interest in improving x86_32 code quality, so a big redesign wouldn't deliver much benefit. The benefits would all be code simplifications and maintenance cost reductions, which are nice, but seem to only get me through the prototype design stage. I'll go dig up my last doc and try to share it, but for now, I think we have to suffer the extra inalloca code in this patch.
clang/test/CodeGen/x86_32-arguments-win32.c
77	I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the behavior that LLVM implements today where large vectors get split across registers and memory? I think most users who pass a 512 bit vector want it to either be passed in ZMM registers or fail to compile. Why do we even support passing 1024 bit vectors? Could we make that an error? Anyway, major redesigns aside, should clang do something when large vectors are passed? Maybe we should warn here? Passing by address is usually the safest way to pass something, so that's an option. Implementing splitting logic in clang doesn't seem worth it.

rnk marked an inline comment as done.Jan 3 2020, 3:02 PM

rnk added inline comments.

clang/include/clang/CodeGen/CGFunctionInfo.h
91	Here's what I wrote, with some sketches of possible LLVM IR that could replace inalloca: https://reviews.llvm.org/P8183 The basic idea is that we need a call setup instruction that forms a region with the call. During CodeGen, we can look forward (potentially across BBs) to determine how much argument stack memory to allocate, allocate it (perhaps in pieces as we go along), and then skip the normal call stack argument adjustment during regular call lowering. Suggestions for names better than "argmem" are welcome. The major complication comes from edges that exit the call setup region. These could be exceptional exits or normal exits with statement expressions and break, return, or goto. Along these paths we need to adjust SP, or risk leaking stack memory. Today, with inalloca, I think we leak stack memory.

rjmccall added inline comments.Jan 3 2020, 5:47 PM

clang/include/clang/CodeGen/CGFunctionInfo.h
91	In the past, I've drafted a more than one unfinished designs for how we could remodel inalloca with tokens so that it can be per-argument instead of something that applies to all argument memory. Unfortunately, I never found the time to finish or implement one. Sorry! I think it would be great to rethink `inalloca` to avoid the duplication and so on, but I certainly didn't mean to suggest that we should do that as part of this patch. (I'll look at your proposal later.) I was trying to ask if it would make sense to change how `inalloca` arguments are represented by `ABIInfo`, so that we could e.g. build a normal indirect `ABIInfo` and then flag that it also needs to be written into an `inalloca` buffer.
clang/test/CodeGen/x86_32-arguments-win32.c
77	I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the behavior that LLVM implements today where large vectors get split across registers and memory? I take it you're implying that the actual (Windows-only?) platform ABI doesn't say anything about this because other compilers don't allow large vectors. How large are the vectors we do have ABI rules for? Do they have the problem as the SysV ABI where the ABI rules are sensitive to compiler flags? Anyway, I didn't realize the i386 Windows ABI ever used registers for arguments. (Whether you can convince LLVM to do so for a function signature that Clang isn't supposed to emit for ABI-conforming functions is a separate story.) You're saying it uses them for vectors? Presumably up to some limit, and only when they're non-variadic arguments?

AntonYudintsev added a subscriber: AntonYudintsev.Jan 6 2020, 11:31 PM

AntonYudintsev added inline comments.

clang/test/CodeGen/x86_32-arguments-win32.c
77	You're saying it uses them for vectors? Presumably up to some limit, and only when they're non-variadic arguments? Sorry to cut in (I am the one who report the issue, and so looking forward for this patch to be merged). Yes, MSVC/x86 win ABI uses three registers for first three arguments. https://godbolt.org/z/PZ3dBa

rjmccall added inline comments.Jan 7 2020, 12:01 AM

clang/test/CodeGen/x86_32-arguments-win32.c
77	Interesting. And anything that would caused the stack to be used is still banned: passing more than 3 vectors or passing them as variadic arguments. I guess they chose not to implement stack realignment when they implemented this, and rather than being stuck with a suboptimal ABI, they just banned the cases that would have required it. Technically that means that they haven't committed to an ABI, so even though LLVM is perfectly happy to realign the stack when required, we shouldn't actually take advantage of that here, and instead we should honor the same restriction.

AntonYudintsev added inline comments.Jan 7 2020, 7:51 AM

clang/test/CodeGen/x86_32-arguments-win32.c
77	As I mentioned in https://reviews.llvm.org/D71915 ( and in https://bugs.llvm.org/show_bug.cgi?id=44395 ) there is at least one particular case, where they do align stack for aligned arguments, although it is not directly a fun call.

rjmccall added inline comments.Jan 7 2020, 9:29 AM

clang/test/CodeGen/x86_32-arguments-win32.c
77	Oh, I see. Then they have in fact implemented stack realignment, sometime between MSVC v19.10 and v19.14, so the ABI is now settled; I should've checked a more recent compiler than the one you linked. So this is now a fairly standard in-registers-up-to-a-limit ABI, except that the limit is only non-zero for vectors. (Highly-aligned struct types are passed indirectly.)

AntonYudintsev added inline comments.Jan 8 2020, 12:20 AM

clang/test/CodeGen/x86_32-arguments-win32.c
77	Sorry for not being clear enough :(. Yes, newer MSVC seem to have implemented stack realignment universally. Older one (19.10) only did that in a particular cases (like in the one I mentioned in my original patchset).

rebase

clang/include/clang/CodeGen/CGFunctionInfo.h
91	I see. I think implementing that would require a greater refactoring of ClangToLLVMArgMapping. We'd need a new place to store the inalloca struct field index. That is currently put in a union with some other stuff, which would no longer work.
clang/test/CodeGen/x86_32-arguments-win32.c
77	To clarify, it's not actually stack realignment, they pass highly aligned things indirectly to avoid having to implement stack realignment. (https://godbolt.org/z/XwKuyC) Realigning the stack for arguments would be hard for MSVC, because the 32-bit compiler always pushes arguments in memory individually, adjusting the stack as it goes along. My patch implements that logic in the frontend, which is necessary for over aligned aggregates anyway, if not vectors.

Unit tests: fail. 61871 tests passed, 1 failed and 781 were skipped.

failed: Clang.PCH/ms-pch-macro.c

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B44020: Diff 238152!Jan 14 2020, 6:19 PM

rjmccall added inline comments.Jan 16 2020, 10:15 AM

clang/include/clang/CodeGen/CGFunctionInfo.h
91	Okay. Then in the short term, I guess this is fine, and the long-term improvement is to change the `inalloca` design.

LGTM

This revision is now accepted and ready to land.Jan 22 2020, 8:29 AM

Closed by commit rG2af74e27ed7d: [MS] Overhaul how clang passes overaligned args on x86_32 (authored by rnk). · Explain WhyJan 23 2020, 4:11 PM

This revision was automatically updated to reflect the committed changes.

rnk mentioned this in D74452: [MS] Mark vectorcall FP and vector args inreg.Feb 11 2020, 5:11 PM

rnk mentioned this in D74455: [MS] Pass aligned, non-trivially copyable things indirectly on x86.Feb 11 2020, 5:36 PM

rnk mentioned this in rG0edb2129258c: [MS] Mark vectorcall FP and vector args inreg.Feb 19 2020, 4:42 PM

rnk mentioned this in D87923: [MS] On x86_32, pass overaligned, non-copyable arguments indirectly.Sep 18 2020, 11:29 AM

rnk mentioned this in rG3b3a16548568: [MS] On x86_32, pass overaligned, non-copyable arguments indirectly.Sep 21 2020, 11:49 AM

Dushistov added a subscriber: Dushistov.Jun 11 2023, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2023, 1:46 PM

Revision Contents

Path

Size

clang/

include/

clang/

CodeGen/

CGFunctionInfo.h

17 lines

lib/

CodeGen/

CGCall.cpp

36 lines

TargetInfo.cpp

74 lines

test/

CodeGen/

x86_32-arguments-win32.c

44 lines

CodeGenCXX/

inalloca-overaligned.cpp

52 lines

inalloca-vector.cpp

79 lines

Diff 240041

clang/include/clang/CodeGen/CGFunctionInfo.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	private:
union {		union {
unsigned DirectOffset; // isDirect() \|\| isExtend()		unsigned DirectOffset; // isDirect() \|\| isExtend()
unsigned IndirectAlign; // isIndirect()		unsigned IndirectAlign; // isIndirect()
unsigned AllocaFieldIndex; // isInAlloca()		unsigned AllocaFieldIndex; // isInAlloca()
};		};
Kind TheKind;		Kind TheKind;
bool PaddingInReg : 1;		bool PaddingInReg : 1;
bool InAllocaSRet : 1; // isInAlloca()		bool InAllocaSRet : 1; // isInAlloca()
		bool InAllocaIndirect : 1;// isInAlloca()
		rjmccallUnsubmitted Not Done Reply Inline Actions Would it be better to handle `inalloca` differently, maybe as a flag rather than as a top-level kind? I'm concerned about gradually duplicating a significant amount of the expressivity of other kinds. rjmccall: Would it be better to handle `inalloca` differently, maybe as a flag rather than as a top-level…
		rnkAuthorUnsubmitted Done Reply Inline Actions In the past, I've drafted a more than one unfinished designs for how we could remodel inalloca with tokens so that it can be per-argument instead of something that applies to all argument memory. Unfortunately, I never found the time to finish or implement one. As I was working on this patch, I was thinking to myself that this could be the moment to implement one of those designs, but it would be pretty involved. Part of the issue is that, personally, I have very little interest in improving x86_32 code quality, so a big redesign wouldn't deliver much benefit. The benefits would all be code simplifications and maintenance cost reductions, which are nice, but seem to only get me through the prototype design stage. I'll go dig up my last doc and try to share it, but for now, I think we have to suffer the extra inalloca code in this patch. rnk: In the past, I've drafted a more than one unfinished designs for how we could remodel inalloca…
		rnkAuthorUnsubmitted Done Reply Inline Actions Here's what I wrote, with some sketches of possible LLVM IR that could replace inalloca: https://reviews.llvm.org/P8183 The basic idea is that we need a call setup instruction that forms a region with the call. During CodeGen, we can look forward (potentially across BBs) to determine how much argument stack memory to allocate, allocate it (perhaps in pieces as we go along), and then skip the normal call stack argument adjustment during regular call lowering. Suggestions for names better than "argmem" are welcome. The major complication comes from edges that exit the call setup region. These could be exceptional exits or normal exits with statement expressions and break, return, or goto. Along these paths we need to adjust SP, or risk leaking stack memory. Today, with inalloca, I think we leak stack memory. rnk: Here's what I wrote, with some sketches of possible LLVM IR that could replace inalloca: https…
		rjmccallUnsubmitted Not Done Reply Inline Actions In the past, I've drafted a more than one unfinished designs for how we could remodel inalloca with tokens so that it can be per-argument instead of something that applies to all argument memory. Unfortunately, I never found the time to finish or implement one. Sorry! I think it would be great to rethink `inalloca` to avoid the duplication and so on, but I certainly didn't mean to suggest that we should do that as part of this patch. (I'll look at your proposal later.) I was trying to ask if it would make sense to change how `inalloca` arguments are represented by `ABIInfo`, so that we could e.g. build a normal indirect `ABIInfo` and then flag that it also needs to be written into an `inalloca` buffer. rjmccall: > In the past, I've drafted a more than one unfinished designs for how we could remodel…
		rnkAuthorUnsubmitted Done Reply Inline Actions I see. I think implementing that would require a greater refactoring of ClangToLLVMArgMapping. We'd need a new place to store the inalloca struct field index. That is currently put in a union with some other stuff, which would no longer work. rnk: I see. I think implementing that would require a greater refactoring of ClangToLLVMArgMapping.
		rjmccallUnsubmitted Not Done Reply Inline Actions Okay. Then in the short term, I guess this is fine, and the long-term improvement is to change the `inalloca` design. rjmccall: Okay. Then in the short term, I guess this is fine, and the long-term improvement is to change…
bool IndirectByVal : 1; // isIndirect()		bool IndirectByVal : 1; // isIndirect()
bool IndirectRealign : 1; // isIndirect()		bool IndirectRealign : 1; // isIndirect()
bool SRetAfterThis : 1; // isIndirect()		bool SRetAfterThis : 1; // isIndirect()
bool InReg : 1; // isDirect() \|\| isExtend() \|\| isIndirect()		bool InReg : 1; // isDirect() \|\| isExtend() \|\| isIndirect()
bool CanBeFlattened: 1; // isDirect()		bool CanBeFlattened: 1; // isDirect()
bool SignExt : 1; // isExtend()		bool SignExt : 1; // isExtend()

bool canHavePaddingType() const {		bool canHavePaddingType() const {
return isDirect() \|\| isExtend() \|\| isIndirect() \|\| isExpand();		return isDirect() \|\| isExtend() \|\| isIndirect() \|\| isExpand();
}		}
void setPaddingType(llvm::Type *T) {		void setPaddingType(llvm::Type *T) {
assert(canHavePaddingType());		assert(canHavePaddingType());
PaddingType = T;		PaddingType = T;
}		}

void setUnpaddedCoerceToType(llvm::Type *T) {		void setUnpaddedCoerceToType(llvm::Type *T) {
assert(isCoerceAndExpand());		assert(isCoerceAndExpand());
UnpaddedCoerceAndExpandType = T;		UnpaddedCoerceAndExpandType = T;
}		}

public:		public:
ABIArgInfo(Kind K = Direct)		ABIArgInfo(Kind K = Direct)
: TypeData(nullptr), PaddingType(nullptr), DirectOffset(0),		: TypeData(nullptr), PaddingType(nullptr), DirectOffset(0), TheKind(K),
TheKind(K), PaddingInReg(false), InAllocaSRet(false),		PaddingInReg(false), InAllocaSRet(false), InAllocaIndirect(false),
IndirectByVal(false), IndirectRealign(false), SRetAfterThis(false),		IndirectByVal(false), IndirectRealign(false), SRetAfterThis(false),
InReg(false), CanBeFlattened(false), SignExt(false) {}		InReg(false), CanBeFlattened(false), SignExt(false) {}

static ABIArgInfo getDirect(llvm::Type *T = nullptr, unsigned Offset = 0,		static ABIArgInfo getDirect(llvm::Type *T = nullptr, unsigned Offset = 0,
llvm::Type *Padding = nullptr,		llvm::Type *Padding = nullptr,
bool CanBeFlattened = true) {		bool CanBeFlattened = true) {
auto AI = ABIArgInfo(Direct);		auto AI = ABIArgInfo(Direct);
AI.setCoerceToType(T);		AI.setCoerceToType(T);
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	static ABIArgInfo getIndirect(CharUnits Alignment, bool ByVal = true,
return AI;		return AI;
}		}
static ABIArgInfo getIndirectInReg(CharUnits Alignment, bool ByVal = true,		static ABIArgInfo getIndirectInReg(CharUnits Alignment, bool ByVal = true,
bool Realign = false) {		bool Realign = false) {
auto AI = getIndirect(Alignment, ByVal, Realign);		auto AI = getIndirect(Alignment, ByVal, Realign);
AI.setInReg(true);		AI.setInReg(true);
return AI;		return AI;
}		}
static ABIArgInfo getInAlloca(unsigned FieldIndex) {		static ABIArgInfo getInAlloca(unsigned FieldIndex, bool Indirect = false) {
auto AI = ABIArgInfo(InAlloca);		auto AI = ABIArgInfo(InAlloca);
AI.setInAllocaFieldIndex(FieldIndex);		AI.setInAllocaFieldIndex(FieldIndex);
		AI.setInAllocaIndirect(Indirect);
return AI;		return AI;
}		}
static ABIArgInfo getExpand() {		static ABIArgInfo getExpand() {
auto AI = ABIArgInfo(Expand);		auto AI = ABIArgInfo(Expand);
AI.setPaddingType(nullptr);		AI.setPaddingType(nullptr);
return AI;		return AI;
}		}
static ABIArgInfo getExpandWithPadding(bool PaddingInReg,		static ABIArgInfo getExpandWithPadding(bool PaddingInReg,
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	unsigned getInAllocaFieldIndex() const {
assert(isInAlloca() && "Invalid kind!");		assert(isInAlloca() && "Invalid kind!");
return AllocaFieldIndex;		return AllocaFieldIndex;
}		}
void setInAllocaFieldIndex(unsigned FieldIndex) {		void setInAllocaFieldIndex(unsigned FieldIndex) {
assert(isInAlloca() && "Invalid kind!");		assert(isInAlloca() && "Invalid kind!");
AllocaFieldIndex = FieldIndex;		AllocaFieldIndex = FieldIndex;
}		}

		unsigned getInAllocaIndirect() const {
		assert(isInAlloca() && "Invalid kind!");
		return InAllocaIndirect;
		}
		void setInAllocaIndirect(bool Indirect) {
		assert(isInAlloca() && "Invalid kind!");
		InAllocaIndirect = Indirect;
		}

/// Return true if this field of an inalloca struct should be returned		/// Return true if this field of an inalloca struct should be returned
/// to implement a struct return calling convention.		/// to implement a struct return calling convention.
bool getInAllocaSRet() const {		bool getInAllocaSRet() const {
assert(isInAlloca() && "Invalid kind!");		assert(isInAlloca() && "Invalid kind!");
return InAllocaSRet;		return InAllocaSRet;
}		}

void setInAllocaSRet(bool SRet) {		void setInAllocaSRet(bool SRet) {
▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 2,333 Lines • ▼ Show 20 Lines	for (FunctionArgList::const_iterator i = Args.begin(), e = Args.end();
std::tie(FirstIRArg, NumIRArgs) = IRFunctionArgs.getIRArgs(ArgNo);		std::tie(FirstIRArg, NumIRArgs) = IRFunctionArgs.getIRArgs(ArgNo);

switch (ArgI.getKind()) {		switch (ArgI.getKind()) {
case ABIArgInfo::InAlloca: {		case ABIArgInfo::InAlloca: {
assert(NumIRArgs == 0);		assert(NumIRArgs == 0);
auto FieldIndex = ArgI.getInAllocaFieldIndex();		auto FieldIndex = ArgI.getInAllocaFieldIndex();
Address V =		Address V =
Builder.CreateStructGEP(ArgStruct, FieldIndex, Arg->getName());		Builder.CreateStructGEP(ArgStruct, FieldIndex, Arg->getName());
		if (ArgI.getInAllocaIndirect())
		V = Address(Builder.CreateLoad(V),
		getContext().getTypeAlignInChars(Ty));
ArgVals.push_back(ParamValue::forIndirect(V));		ArgVals.push_back(ParamValue::forIndirect(V));
break;		break;
}		}

case ABIArgInfo::Indirect: {		case ABIArgInfo::Indirect: {
assert(NumIRArgs == 1);		assert(NumIRArgs == 1);
Address ParamAddr = Address(FnArgs[FirstIRArg], ArgI.getIndirectAlign());		Address ParamAddr = Address(FnArgs[FirstIRArg], ArgI.getIndirectAlign());

▲ Show 20 Lines • Show All 1,683 Lines • ▼ Show 20 Lines	for (CallArgList::const_iterator I = CallArgs.begin(), E = CallArgs.end();
unsigned FirstIRArg, NumIRArgs;		unsigned FirstIRArg, NumIRArgs;
std::tie(FirstIRArg, NumIRArgs) = IRFunctionArgs.getIRArgs(ArgNo);		std::tie(FirstIRArg, NumIRArgs) = IRFunctionArgs.getIRArgs(ArgNo);

switch (ArgInfo.getKind()) {		switch (ArgInfo.getKind()) {
case ABIArgInfo::InAlloca: {		case ABIArgInfo::InAlloca: {
assert(NumIRArgs == 0);		assert(NumIRArgs == 0);
assert(getTarget().getTriple().getArch() == llvm::Triple::x86);		assert(getTarget().getTriple().getArch() == llvm::Triple::x86);
if (I->isAggregate()) {		if (I->isAggregate()) {
// Replace the placeholder with the appropriate argument slot GEP.
Address Addr = I->hasLValue()		Address Addr = I->hasLValue()
? I->getKnownLValue().getAddress(*this)		? I->getKnownLValue().getAddress(*this)
: I->getKnownRValue().getAggregateAddress();		: I->getKnownRValue().getAggregateAddress();
llvm::Instruction *Placeholder =		llvm::Instruction *Placeholder =
cast<llvm::Instruction>(Addr.getPointer());		cast<llvm::Instruction>(Addr.getPointer());

		if (!ArgInfo.getInAllocaIndirect()) {
		// Replace the placeholder with the appropriate argument slot GEP.
CGBuilderTy::InsertPoint IP = Builder.saveIP();		CGBuilderTy::InsertPoint IP = Builder.saveIP();
Builder.SetInsertPoint(Placeholder);		Builder.SetInsertPoint(Placeholder);
Addr =		Addr = Builder.CreateStructGEP(ArgMemory,
Builder.CreateStructGEP(ArgMemory, ArgInfo.getInAllocaFieldIndex());		ArgInfo.getInAllocaFieldIndex());
Builder.restoreIP(IP);		Builder.restoreIP(IP);
		} else {
		// For indirect things such as overaligned structs, replace the
		// placeholder with a regular aggregate temporary alloca. Store the
		// address of this alloca into the struct.
		Addr = CreateMemTemp(info_it->type, "inalloca.indirect.tmp");
		Address ArgSlot = Builder.CreateStructGEP(
		ArgMemory, ArgInfo.getInAllocaFieldIndex());
		Builder.CreateStore(Addr.getPointer(), ArgSlot);
		}
deferPlaceholderReplacement(Placeholder, Addr.getPointer());		deferPlaceholderReplacement(Placeholder, Addr.getPointer());
		} else if (ArgInfo.getInAllocaIndirect()) {
		// Make a temporary alloca and store the address of it into the argument
		// struct.
		Address Addr = CreateMemTempWithoutCast(
		I->Ty, getContext().getTypeAlignInChars(I->Ty),
		"indirect-arg-temp");
		I->copyInto(*this, Addr);
		Address ArgSlot =
		Builder.CreateStructGEP(ArgMemory, ArgInfo.getInAllocaFieldIndex());
		Builder.CreateStore(Addr.getPointer(), ArgSlot);
} else {		} else {
// Store the RValue into the argument struct.		// Store the RValue into the argument struct.
Address Addr =		Address Addr =
Builder.CreateStructGEP(ArgMemory, ArgInfo.getInAllocaFieldIndex());		Builder.CreateStructGEP(ArgMemory, ArgInfo.getInAllocaFieldIndex());
unsigned AS = Addr.getType()->getPointerAddressSpace();		unsigned AS = Addr.getType()->getPointerAddressSpace();
llvm::Type *MemType = ConvertTypeForMem(I->Ty)->getPointerTo(AS);		llvm::Type *MemType = ConvertTypeForMem(I->Ty)->getPointerTo(AS);
// There are some cases where a trivial bitcast is not avoidable. The		// There are some cases where a trivial bitcast is not avoidable. The
// definition of a type later in a translation unit may change it's type		// definition of a type later in a translation unit may change it's type
▲ Show 20 Lines • Show All 700 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,670 Lines • ▼ Show 20 Lines
ABIArgInfo X86_32ABIInfo::classifyArgumentType(QualType Ty,		ABIArgInfo X86_32ABIInfo::classifyArgumentType(QualType Ty,
CCState &State) const {		CCState &State) const {
// FIXME: Set alignment on indirect arguments.		// FIXME: Set alignment on indirect arguments.
bool IsFastCall = State.CC == llvm::CallingConv::X86_FastCall;		bool IsFastCall = State.CC == llvm::CallingConv::X86_FastCall;
bool IsRegCall = State.CC == llvm::CallingConv::X86_RegCall;		bool IsRegCall = State.CC == llvm::CallingConv::X86_RegCall;
bool IsVectorCall = State.CC == llvm::CallingConv::X86_VectorCall;		bool IsVectorCall = State.CC == llvm::CallingConv::X86_VectorCall;

Ty = useFirstFieldIfTransparentUnion(Ty);		Ty = useFirstFieldIfTransparentUnion(Ty);
		TypeInfo TI = getContext().getTypeInfo(Ty);

// Check with the C++ ABI first.		// Check with the C++ ABI first.
const RecordType *RT = Ty->getAs<RecordType>();		const RecordType *RT = Ty->getAs<RecordType>();
if (RT) {		if (RT) {
CGCXXABI::RecordArgABI RAA = getRecordArgABI(RT, getCXXABI());		CGCXXABI::RecordArgABI RAA = getRecordArgABI(RT, getCXXABI());
if (RAA == CGCXXABI::RAA_Indirect) {		if (RAA == CGCXXABI::RAA_Indirect) {
return getIndirectResult(Ty, false, State);		return getIndirectResult(Ty, false, State);
} else if (RAA == CGCXXABI::RAA_DirectInMemory) {		} else if (RAA == CGCXXABI::RAA_DirectInMemory) {
Show All 33 Lines	if (isAggregateTypeForABI(Ty)) {
if (!IsWin32StructABI && isEmptyRecord(getContext(), Ty, true))		if (!IsWin32StructABI && isEmptyRecord(getContext(), Ty, true))
return ABIArgInfo::getIgnore();		return ABIArgInfo::getIgnore();

llvm::LLVMContext &LLVMContext = getVMContext();		llvm::LLVMContext &LLVMContext = getVMContext();
llvm::IntegerType *Int32 = llvm::Type::getInt32Ty(LLVMContext);		llvm::IntegerType *Int32 = llvm::Type::getInt32Ty(LLVMContext);
bool NeedsPadding = false;		bool NeedsPadding = false;
bool InReg;		bool InReg;
if (shouldAggregateUseDirect(Ty, State, InReg, NeedsPadding)) {		if (shouldAggregateUseDirect(Ty, State, InReg, NeedsPadding)) {
unsigned SizeInRegs = (getContext().getTypeSize(Ty) + 31) / 32;		unsigned SizeInRegs = (TI.Width + 31) / 32;
SmallVector<llvm::Type*, 3> Elements(SizeInRegs, Int32);		SmallVector<llvm::Type*, 3> Elements(SizeInRegs, Int32);
llvm::Type *Result = llvm::StructType::get(LLVMContext, Elements);		llvm::Type *Result = llvm::StructType::get(LLVMContext, Elements);
if (InReg)		if (InReg)
return ABIArgInfo::getDirectInReg(Result);		return ABIArgInfo::getDirectInReg(Result);
else		else
return ABIArgInfo::getDirect(Result);		return ABIArgInfo::getDirect(Result);
}		}
llvm::IntegerType *PaddingType = NeedsPadding ? Int32 : nullptr;		llvm::IntegerType *PaddingType = NeedsPadding ? Int32 : nullptr;

		// Pass over-aligned aggregates on Windows indirectly. This behavior was
		// added in MSVC 2015.
		if (IsWin32StructABI && TI.AlignIsRequired && TI.Align > 32)
		return getIndirectResult(Ty, /ByVal=/false, State);

// Expand small (<= 128-bit) record types when we know that the stack layout		// Expand small (<= 128-bit) record types when we know that the stack layout
// of those arguments will match the struct. This is important because the		// of those arguments will match the struct. This is important because the
// LLVM backend isn't smart enough to remove byval, which inhibits many		// LLVM backend isn't smart enough to remove byval, which inhibits many
// optimizations.		// optimizations.
// Don't do this for the MCU if there are still free integer registers		// Don't do this for the MCU if there are still free integer registers
// (see X86_64 ABI for full explanation).		// (see X86_64 ABI for full explanation).
if (getContext().getTypeSize(Ty) <= 4 * 32 &&		if (TI.Width <= 4 * 32 && (!IsMCUABI \|\| State.FreeRegs == 0) &&
(!IsMCUABI \|\| State.FreeRegs == 0) && canExpandIndirectArgument(Ty))		canExpandIndirectArgument(Ty))
return ABIArgInfo::getExpandWithPadding(		return ABIArgInfo::getExpandWithPadding(
IsFastCall \|\| IsVectorCall \|\| IsRegCall, PaddingType);		IsFastCall \|\| IsVectorCall \|\| IsRegCall, PaddingType);

return getIndirectResult(Ty, true, State);		return getIndirectResult(Ty, true, State);
}		}

if (const VectorType *VT = Ty->getAs<VectorType>()) {		if (const VectorType *VT = Ty->getAs<VectorType>()) {
		// On Windows, vectors are passed directly if registers are available, or
		// indirectly if not. This avoids the need to align argument memory. Pass
		// user-defined vector types larger than 512 bits indirectly for simplicity.
		if (IsWin32StructABI) {
		if (TI.Width <= 512 && State.FreeSSERegs > 0) {
		--State.FreeSSERegs;
		return ABIArgInfo::getDirectInReg();
		}
		return getIndirectResult(Ty, /ByVal=/false, State);
		}

// On Darwin, some vectors are passed in memory, we handle this by passing		// On Darwin, some vectors are passed in memory, we handle this by passing
// it as an i8/i16/i32/i64.		// it as an i8/i16/i32/i64.
if (IsDarwinVectorABI) {		if (IsDarwinVectorABI) {
uint64_t Size = getContext().getTypeSize(Ty);		if ((TI.Width == 8 \|\| TI.Width == 16 \|\| TI.Width == 32) \|\|
if ((Size == 8 \|\| Size == 16 \|\| Size == 32) \|\|		(TI.Width == 64 && VT->getNumElements() == 1))
(Size == 64 && VT->getNumElements() == 1))		return ABIArgInfo::getDirect(
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(),		llvm::IntegerType::get(getVMContext(), TI.Width));
Size));
}		}

if (IsX86_MMXType(CGT.ConvertType(Ty)))		if (IsX86_MMXType(CGT.ConvertType(Ty)))
return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(), 64));		return ABIArgInfo::getDirect(llvm::IntegerType::get(getVMContext(), 64));

return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
}		}

Show All 13 Lines	if (InReg)
return ABIArgInfo::getDirectInReg();		return ABIArgInfo::getDirectInReg();
return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
}		}

void X86_32ABIInfo::computeInfo(CGFunctionInfo &FI) const {		void X86_32ABIInfo::computeInfo(CGFunctionInfo &FI) const {
CCState State(FI);		CCState State(FI);
if (IsMCUABI)		if (IsMCUABI)
State.FreeRegs = 3;		State.FreeRegs = 3;
else if (State.CC == llvm::CallingConv::X86_FastCall)		else if (State.CC == llvm::CallingConv::X86_FastCall) {
State.FreeRegs = 2;		State.FreeRegs = 2;
else if (State.CC == llvm::CallingConv::X86_VectorCall) {		State.FreeSSERegs = 3;
		} else if (State.CC == llvm::CallingConv::X86_VectorCall) {
State.FreeRegs = 2;		State.FreeRegs = 2;
State.FreeSSERegs = 6;		State.FreeSSERegs = 6;
} else if (FI.getHasRegParm())		} else if (FI.getHasRegParm())
State.FreeRegs = FI.getRegParm();		State.FreeRegs = FI.getRegParm();
else if (State.CC == llvm::CallingConv::X86_RegCall) {		else if (State.CC == llvm::CallingConv::X86_RegCall) {
State.FreeRegs = 5;		State.FreeRegs = 5;
State.FreeSSERegs = 8;		State.FreeSSERegs = 8;
		} else if (IsWin32StructABI) {
		// Since MSVC 2015, the first three SSE vectors have been passed in
		// registers. The rest are passed indirectly.
		State.FreeRegs = DefaultNumRegisterParameters;
		State.FreeSSERegs = 3;
} else		} else
State.FreeRegs = DefaultNumRegisterParameters;		State.FreeRegs = DefaultNumRegisterParameters;

if (!::classifyReturnType(getCXXABI(), FI, *this)) {		if (!::classifyReturnType(getCXXABI(), FI, *this)) {
FI.getReturnInfo() = classifyReturnType(FI.getReturnType(), State);		FI.getReturnInfo() = classifyReturnType(FI.getReturnType(), State);
} else if (FI.getReturnInfo().isIndirect()) {		} else if (FI.getReturnInfo().isIndirect()) {
// The C++ ABI is not aware of register usage, so we have to check if the		// The C++ ABI is not aware of register usage, so we have to check if the
// return value was sret and put it in a register ourselves if appropriate.		// return value was sret and put it in a register ourselves if appropriate.
Show All 30 Lines	if (UsedInAlloca)
rewriteWithInAlloca(FI);		rewriteWithInAlloca(FI);
}		}

void		void
X86_32ABIInfo::addFieldToArgStruct(SmallVector<llvm::Type *, 6> &FrameFields,		X86_32ABIInfo::addFieldToArgStruct(SmallVector<llvm::Type *, 6> &FrameFields,
CharUnits &StackOffset, ABIArgInfo &Info,		CharUnits &StackOffset, ABIArgInfo &Info,
QualType Type) const {		QualType Type) const {
// Arguments are always 4-byte-aligned.		// Arguments are always 4-byte-aligned.
CharUnits FieldAlign = CharUnits::fromQuantity(4);		CharUnits WordSize = CharUnits::fromQuantity(4);
		assert(StackOffset.isMultipleOf(WordSize) && "unaligned inalloca struct");

assert(StackOffset.isMultipleOf(FieldAlign) && "unaligned inalloca struct");		// sret pointers and indirect things will require an extra pointer
Info = ABIArgInfo::getInAlloca(FrameFields.size());		// indirection, unless they are byval. Most things are byval, and will not
FrameFields.push_back(CGT.ConvertTypeForMem(Type));		// require this indirection.
StackOffset += getContext().getTypeSizeInChars(Type);		bool IsIndirect = false;
		if (Info.isIndirect() && !Info.getIndirectByVal())
		IsIndirect = true;
		Info = ABIArgInfo::getInAlloca(FrameFields.size(), IsIndirect);
		llvm::Type *LLTy = CGT.ConvertTypeForMem(Type);
		if (IsIndirect)
		LLTy = LLTy->getPointerTo(0);
		FrameFields.push_back(LLTy);
		StackOffset += IsIndirect ? WordSize : getContext().getTypeSizeInChars(Type);

// Insert padding bytes to respect alignment.		// Insert padding bytes to respect alignment.
CharUnits FieldEnd = StackOffset;		CharUnits FieldEnd = StackOffset;
StackOffset = FieldEnd.alignTo(FieldAlign);		StackOffset = FieldEnd.alignTo(WordSize);
if (StackOffset != FieldEnd) {		if (StackOffset != FieldEnd) {
CharUnits NumBytes = StackOffset - FieldEnd;		CharUnits NumBytes = StackOffset - FieldEnd;
llvm::Type *Ty = llvm::Type::getInt8Ty(getVMContext());		llvm::Type *Ty = llvm::Type::getInt8Ty(getVMContext());
Ty = llvm::ArrayType::get(Ty, NumBytes.getQuantity());		Ty = llvm::ArrayType::get(Ty, NumBytes.getQuantity());
FrameFields.push_back(Ty);		FrameFields.push_back(Ty);
}		}
}		}

static bool isArgInAlloca(const ABIArgInfo &Info) {		static bool isArgInAlloca(const ABIArgInfo &Info) {
// Leave ignored and inreg arguments alone.		// Leave ignored and inreg arguments alone.
switch (Info.getKind()) {		switch (Info.getKind()) {
case ABIArgInfo::InAlloca:		case ABIArgInfo::InAlloca:
return true;		return true;
case ABIArgInfo::Indirect:
assert(Info.getIndirectByVal());
return true;
case ABIArgInfo::Ignore:		case ABIArgInfo::Ignore:
return false;		return false;
		case ABIArgInfo::Indirect:
case ABIArgInfo::Direct:		case ABIArgInfo::Direct:
case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
if (Info.getInReg())		return !Info.getInReg();
return false;
return true;
case ABIArgInfo::Expand:		case ABIArgInfo::Expand:
case ABIArgInfo::CoerceAndExpand:		case ABIArgInfo::CoerceAndExpand:
// These are aggregate types which are never passed in registers when		// These are aggregate types which are never passed in registers when
// inalloca is involved.		// inalloca is involved.
return true;		return true;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}
Show All 17 Lines	void X86_32ABIInfo::rewriteWithInAlloca(CGFunctionInfo &FI) const {
if (Ret.isIndirect() && Ret.isSRetAfterThis() && !IsThisCall &&		if (Ret.isIndirect() && Ret.isSRetAfterThis() && !IsThisCall &&
isArgInAlloca(I->info)) {		isArgInAlloca(I->info)) {
addFieldToArgStruct(FrameFields, StackOffset, I->info, I->type);		addFieldToArgStruct(FrameFields, StackOffset, I->info, I->type);
++I;		++I;
}		}

// Put the sret parameter into the inalloca struct if it's in memory.		// Put the sret parameter into the inalloca struct if it's in memory.
if (Ret.isIndirect() && !Ret.getInReg()) {		if (Ret.isIndirect() && !Ret.getInReg()) {
CanQualType PtrTy = getContext().getPointerType(FI.getReturnType());		addFieldToArgStruct(FrameFields, StackOffset, Ret, FI.getReturnType());
addFieldToArgStruct(FrameFields, StackOffset, Ret, PtrTy);
// On Windows, the hidden sret parameter is always returned in eax.		// On Windows, the hidden sret parameter is always returned in eax.
Ret.setInAllocaSRet(IsWin32StructABI);		Ret.setInAllocaSRet(IsWin32StructABI);
}		}

// Skip the 'this' parameter in ecx.		// Skip the 'this' parameter in ecx.
if (IsThisCall)		if (IsThisCall)
++I;		++I;

▲ Show 20 Lines • Show All 8,147 Lines • Show Last 20 Lines

clang/test/CodeGen/x86_32-arguments-win32.c

	Show All 40 Lines
	// CHECK-LABEL: define dso_local i32 @f6_1()			// CHECK-LABEL: define dso_local i32 @f6_1()
	// CHECK-LABEL: define dso_local void @f6_2(float %a0.0)			// CHECK-LABEL: define dso_local void @f6_2(float %a0.0)
	struct s6 {			struct s6 {
	float a;			float a;
	};			};
	struct s6 f6_1(void) { while (1) {} }			struct s6 f6_1(void) { while (1) {} }
	void f6_2(struct s6 a0) {}			void f6_2(struct s6 a0) {}


				// MSVC passes up to three vectors in registers, and the rest indirectly. We
				// (arbitrarily) pass oversized vectors indirectly, since that is the safest way
				// to do it.
				typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
				typedef float __m256 __attribute__((__vector_size__(32), __aligned__(32)));
				typedef float __m512 __attribute__((__vector_size__(64), __aligned__(64)));
				typedef float __m1024 __attribute__((__vector_size__(128), __aligned__(128)));

				__m128 gv128;
				__m256 gv256;
				__m512 gv512;
				__m1024 gv1024;

				void receive_vec_128(__m128 x, __m128 y, __m128 z, __m128 w, __m128 q) {
				gv128 = x + y + z + w + q;
				}
				void receive_vec_256(__m256 x, __m256 y, __m256 z, __m256 w, __m256 q) {
				gv256 = x + y + z + w + q;
				}
				void receive_vec_512(__m512 x, __m512 y, __m512 z, __m512 w, __m512 q) {
				gv512 = x + y + z + w + q;
				}
				void receive_vec_1024(__m1024 x, __m1024 y, __m1024 z, __m1024 w, __m1024 q) {
				gv1024 = x + y + z + w + q;
				}
				// CHECK-LABEL: define dso_local void @receive_vec_128(<4 x float> inreg %x, <4 x float> inreg %y, <4 x float> inreg %z, <4 x float>* %0, <4 x float>* %1)
				// CHECK-LABEL: define dso_local void @receive_vec_256(<8 x float> inreg %x, <8 x float> inreg %y, <8 x float> inreg %z, <8 x float>* %0, <8 x float>* %1)
				// CHECK-LABEL: define dso_local void @receive_vec_512(<16 x float> inreg %x, <16 x float> inreg %y, <16 x float> inreg %z, <16 x float>* %0, <16 x float>* %1)
				craig.topperUnsubmitted Not Done Reply Inline Actions What happens in the backend with inreg if 512-bit vectors aren't legal? craig.topper: What happens in the backend with inreg if 512-bit vectors aren't legal?
				rnkAuthorUnsubmitted Done Reply Inline Actions LLVM splits the vector up using the largest legal vector size. As many pieces as possible are passed in available XMM/YMM registers, and the rest are passed in memory. MSVC, of course, assumes the user wanted the larger vector size, and uses whatever vector instructions it needs to move the arguments around. Previously, I advocated for a model where calling an Intel intrinsic function had the effect of implicitly marking the caller with the target attributes of the intrinsic. This falls down if the user tries to write a single function that conditionally branches between code that uses different instruction set extensions. You can imagine the SSE2 codepath accidentally using AVX instructions because the compiler thinks they are better. I'm told that ICC models CPU micro-architectural features in the CFG, but I don't ever expect that LLVM will do that. If we're stuck with per-function CPU feature settings, it seems like it would be nice to try to do what the user asked by default, and warn the user if we see them doing a cpuid check in a function that has been implicitly blessed with some target attributes. You could imagine doing a similar thing when large vector type variables are used: if a large vector argument or local is used, implicitly enable the appropriate target features to move vectors of that size around. This idea didn't get anywhere, and the current situation has persisted. You know, maybe we should just keep clang the way it is, and just set up a warning in the backend that says "hey, I split your large vector. You probably didn't want that." And then we just continue doing what we do now. Nobody likes backend warnings, but it seems better than the current direction of the frontend knowing every detail of x86 vector extensions. rnk: LLVM splits the vector up using the largest legal vector size. As many pieces as possible are…
				rjmccallUnsubmitted Not Done Reply Inline Actions If target attributes affect ABI, it seems really dangerous to implicitly set attributes based on what intrinsics are called. The local CPU-testing problem seems similar to the problems with local `#pragma STDC FENV_ACCESS` blocks that the constrained-FP people are looking into. They both have a "this operation is normally fully optimizable, but we might need to be more careful in specific functions" aspect to them. I wonder if there's a reasonable way to unify the approaches, or at least benefit from lessons learned. rjmccall: If target attributes affect ABI, it seems really dangerous to implicitly set attributes based…
				rnkAuthorUnsubmitted Done Reply Inline Actions I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the behavior that LLVM implements today where large vectors get split across registers and memory? I think most users who pass a 512 bit vector want it to either be passed in ZMM registers or fail to compile. Why do we even support passing 1024 bit vectors? Could we make that an error? Anyway, major redesigns aside, should clang do something when large vectors are passed? Maybe we should warn here? Passing by address is usually the safest way to pass something, so that's an option. Implementing splitting logic in clang doesn't seem worth it. rnk: I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the…
				rjmccallUnsubmitted Not Done Reply Inline Actions I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the behavior that LLVM implements today where large vectors get split across registers and memory? I take it you're implying that the actual (Windows-only?) platform ABI doesn't say anything about this because other compilers don't allow large vectors. How large are the vectors we do have ABI rules for? Do they have the problem as the SysV ABI where the ABI rules are sensitive to compiler flags? Anyway, I didn't realize the i386 Windows ABI ever used registers for arguments. (Whether you can convince LLVM to do so for a function signature that Clang isn't supposed to emit for ABI-conforming functions is a separate story.) You're saying it uses them for vectors? Presumably up to some limit, and only when they're non-variadic arguments? rjmccall: > I agree, we wouldn't want intrinsic usage to change ABI. But, does anybody actually want the…
				AntonYudintsevUnsubmitted Not Done Reply Inline Actions You're saying it uses them for vectors? Presumably up to some limit, and only when they're non-variadic arguments? Sorry to cut in (I am the one who report the issue, and so looking forward for this patch to be merged). Yes, MSVC/x86 win ABI uses three registers for first three arguments. https://godbolt.org/z/PZ3dBa AntonYudintsev: > You're saying it uses them for vectors? Presumably up to some limit, and only when they're…
				rjmccallUnsubmitted Not Done Reply Inline Actions Interesting. And anything that would caused the stack to be used is still banned: passing more than 3 vectors or passing them as variadic arguments. I guess they chose not to implement stack realignment when they implemented this, and rather than being stuck with a suboptimal ABI, they just banned the cases that would have required it. Technically that means that they haven't committed to an ABI, so even though LLVM is perfectly happy to realign the stack when required, we shouldn't actually take advantage of that here, and instead we should honor the same restriction. rjmccall: Interesting. And anything that would caused the stack to be used is still banned: passing more…
				AntonYudintsevUnsubmitted Not Done Reply Inline Actions As I mentioned in https://reviews.llvm.org/D71915 ( and in https://bugs.llvm.org/show_bug.cgi?id=44395 ) there is at least one particular case, where they do align stack for aligned arguments, although it is not directly a fun call. AntonYudintsev: As I mentioned in https://reviews.llvm.org/D71915 ( and in https://bugs.llvm.org/show_bug.cgi?
				rjmccallUnsubmitted Not Done Reply Inline Actions Oh, I see. Then they have in fact implemented stack realignment, sometime between MSVC v19.10 and v19.14, so the ABI is now settled; I should've checked a more recent compiler than the one you linked. So this is now a fairly standard in-registers-up-to-a-limit ABI, except that the limit is only non-zero for vectors. (Highly-aligned struct types are passed indirectly.) rjmccall: Oh, I see. Then they have in fact implemented stack realignment, sometime between MSVC v19.
				AntonYudintsevUnsubmitted Not Done Reply Inline Actions Sorry for not being clear enough :(. Yes, newer MSVC seem to have implemented stack realignment universally. Older one (19.10) only did that in a particular cases (like in the one I mentioned in my original patchset). AntonYudintsev: Sorry for not being clear enough :(. Yes, newer MSVC seem to have implemented stack…
				rnkAuthorUnsubmitted Done Reply Inline Actions To clarify, it's not actually stack realignment, they pass highly aligned things indirectly to avoid having to implement stack realignment. (https://godbolt.org/z/XwKuyC) Realigning the stack for arguments would be hard for MSVC, because the 32-bit compiler always pushes arguments in memory individually, adjusting the stack as it goes along. My patch implements that logic in the frontend, which is necessary for over aligned aggregates anyway, if not vectors. rnk: To clarify, it's not actually stack realignment, they pass highly aligned things indirectly to…
				// CHECK-LABEL: define dso_local void @receive_vec_1024(<32 x float>* %0, <32 x float>* %1, <32 x float>* %2, <32 x float>* %3, <32 x float>* %4)

				void pass_vec_128() {
				__m128 z = {0};
				receive_vec_128(z, z, z, z, z);
				}

				// CHECK-LABEL: define dso_local void @pass_vec_128()
				// CHECK: call void @receive_vec_128(<4 x float> inreg %{{[^,)]}}, <4 x float> inreg %{{[^,)]}}, <4 x float> inreg %{{[^,)]}}, <4 x float> %{{[^,)]}}, <4 x float> %{{[^,)]*}})


				void __fastcall fastcall_indirect_vec(__m128 x, __m128 y, __m128 z, __m128 w, int edx, __m128 q) {
				gv128 = x + y + z + w + q;
				}
				// CHECK-LABEL: define dso_local x86_fastcallcc void @"\01@fastcall_indirect_vec@84"(<4 x float> inreg %x, <4 x float> inreg %y, <4 x float> inreg %z, <4 x float>* inreg %0, i32 inreg %edx, <4 x float>* %1)

clang/test/CodeGenCXX/inalloca-overaligned.cpp

This file was added.

				// RUN: %clang_cc1 -fms-extensions -w -triple i386-pc-win32 -emit-llvm -o - %s \| FileCheck %s

				// PR44395
				// MSVC passes overaligned types indirectly since MSVC 2015. Make sure that
				// works with inalloca.

				// FIXME: Pass non-trivial and overaligned types indirectly. Right now the C++
				// ABI rules say to use inalloca, and they take precedence, so it's not easy to
				// implement this.


				struct NonTrivial {
				NonTrivial();
				NonTrivial(const NonTrivial &o);
				int x;
				};

				struct __declspec(align(64)) OverAligned {
				OverAligned();
				int buf[16];
				};

				extern int gvi32;

				int receive_inalloca_overaligned(NonTrivial nt, OverAligned o) {
				return nt.x + o.buf[0];
				}

				// CHECK-LABEL: define dso_local i32 @"?receive_inalloca_overaligned@@Y{{.*}}"
				// CHECK-SAME: (<{ %struct.NonTrivial, %struct.OverAligned* }>* inalloca %0)

				int pass_inalloca_overaligned() {
				gvi32 = receive_inalloca_overaligned(NonTrivial(), OverAligned());
				return gvi32;
				}

				// CHECK-LABEL: define dso_local i32 @"?pass_inalloca_overaligned@@Y{{.*}}"
				// CHECK: [[TMP:%[^ ]*]] = alloca %struct.OverAligned, align 64
				// CHECK: call i8* @llvm.stacksave()
				// CHECK: alloca inalloca <{ %struct.NonTrivial, %struct.OverAligned* }>

				// Construct OverAligned into TMP.
				// CHECK: call x86_thiscallcc %struct.OverAligned* @"??0OverAligned@@QAE@XZ"(%struct.OverAligned* [[TMP]])

				// Construct NonTrivial into the GEP.
				// CHECK: [[GEP:%[^ ]]] = getelementptr inbounds <{ %struct.NonTrivial, %struct.OverAligned }>, <{ %struct.NonTrivial, %struct.OverAligned* }>* %{{.*}}, i32 0, i32 0
				// CHECK: call x86_thiscallcc %struct.NonTrivial* @"??0NonTrivial@@QAE@XZ"(%struct.NonTrivial* [[GEP]])

				// Store the address of an OverAligned temporary into the struct.
				// CHECK: getelementptr inbounds <{ %struct.NonTrivial, %struct.OverAligned* }>, <{ %struct.NonTrivial, %struct.OverAligned* }>* %{{.*}}, i32 0, i32 1
				// CHECK: store %struct.OverAligned* [[TMP]], %struct.OverAligned** %{{.*}}, align 4
				// CHECK: call i32 @"?receive_inalloca_overaligned@@Y{{.}}"(<{ %struct.NonTrivial, %struct.OverAligned }>* inalloca %argmem)

clang/test/CodeGenCXX/inalloca-vector.cpp

This file was added.

				// RUN: %clang_cc1 -w -triple i686-pc-win32 -emit-llvm -o - %s \| FileCheck %s

				// PR44395
				// MSVC passes up to three vectors in registers, and the rest indirectly. Check
				// that both are compatible with an inalloca prototype.

				struct NonTrivial {
				NonTrivial();
				NonTrivial(const NonTrivial &o);
				unsigned handle;
				};

				typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
				__m128 gv128;

				// nt, w, and q will be in the inalloca pack.
				void receive_vec_128(NonTrivial nt, __m128 x, __m128 y, __m128 z, __m128 w, __m128 q) {
				gv128 = x + y + z + w + q;
				}
				// CHECK-LABEL: define dso_local void @"?receive_vec_128@@YAXUNonTrivial@@T__m128@@1111@Z"
				// CHECK-SAME: (<4 x float> inreg %x,
				// CHECK-SAME: <4 x float> inreg %y,
				// CHECK-SAME: <4 x float> inreg %z,
				// CHECK-SAME: <{ %struct.NonTrivial, <4 x float>, <4 x float> }>* inalloca %0)

				void pass_vec_128() {
				__m128 z = {0};
				receive_vec_128(NonTrivial(), z, z, z, z, z);
				}
				// CHECK-LABEL: define dso_local void @"?pass_vec_128@@YAXXZ"()
				// CHECK: getelementptr inbounds <{ %struct.NonTrivial, <4 x float>, <4 x float> }>, <{ %struct.NonTrivial, <4 x float>, <4 x float> }>* %{{[^,]*}}, i32 0, i32 0
				// CHECK: call x86_thiscallcc %struct.NonTrivial* @"??0NonTrivial@@QAE@XZ"(%struct.NonTrivial* %{{.*}})

				// Store q, store temp alloca.
				// CHECK: store <4 x float> %{{[^,]}}, <4 x float> %{{[^,]*}}, align 16
				// CHECK: getelementptr inbounds <{ %struct.NonTrivial, <4 x float>, <4 x float> }>, <{ %struct.NonTrivial, <4 x float>, <4 x float> }>* %{{[^,]*}}, i32 0, i32 1
				// CHECK: store <4 x float>* %{{[^,]}}, <4 x float>* %{{[^,]*}}, align 4

				// Store w, store temp alloca.
				// CHECK: store <4 x float> %{{[^,]}}, <4 x float> %{{[^,]*}}, align 16
				// CHECK: getelementptr inbounds <{ %struct.NonTrivial, <4 x float>, <4 x float> }>, <{ %struct.NonTrivial, <4 x float>, <4 x float> }>* %{{[^,]*}}, i32 0, i32 2
				// CHECK: store <4 x float>* %{{[^,]}}, <4 x float>* %{{[^,]*}}, align 4

				// CHECK: call void @"?receive_vec_128@@YAXUNonTrivial@@T__m128@@1111@Z"
				// CHECK-SAME: (<4 x float> inreg %{{[^,]*}},
				// CHECK-SAME: <4 x float> inreg %{{[^,]*}},
				// CHECK-SAME: <4 x float> inreg %{{[^,]*}},
				// CHECK-SAME: <{ %struct.NonTrivial, <4 x float>, <4 x float> }>* inalloca %{{[^,]*}})

				// w will be passed indirectly by register, and q will be passed indirectly, but
				// the pointer will be in memory.
				void __fastcall fastcall_receive_vec(__m128 x, __m128 y, __m128 z, __m128 w, int edx, __m128 q, NonTrivial nt) {
				gv128 = x + y + z + w + q;
				}
				// CHECK-LABEL: define dso_local x86_fastcallcc void @"?fastcall_receive_vec@@Y{{[^"]*}}"
				// CHECK-SAME: (<4 x float> inreg %x,
				// CHECK-SAME: <4 x float> inreg %y,
				// CHECK-SAME: <4 x float> inreg %z,
				// CHECK-SAME: <4 x float>* inreg %0,
				// CHECK-SAME: i32 inreg %edx,
				// CHECK-SAME: <{ <4 x float>, %struct.NonTrivial }> inalloca %1)


				void __vectorcall vectorcall_receive_vec(double xmm0, double xmm1, double xmm2,
				__m128 x, __m128 y, __m128 z,
				__m128 w, int edx, __m128 q, NonTrivial nt) {
				gv128 = x + y + z + w + q;
				}
				// FIXME: Enable these checks, clang generates wrong IR.
				// CHECK-LABEL: define dso_local x86_vectorcallcc void @"?vectorcall_receive_vec@@Y{{[^"]*}}"
				// CHECKX-SAME: (double inreg %xmm0,
				// CHECKX-SAME: double inreg %xmm1,
				erichkeaneUnsubmitted Not Done Reply Inline Actions Are all the checks hre on out disabled for a reason? erichkeane: Are all the checks hre on out disabled for a reason?
				rnkAuthorUnsubmitted Done Reply Inline Actions Yes, this is case 2 in the commit message. I won't close the bug without coming back to this. rnk: Yes, this is case 2 in the commit message. I won't close the bug without coming back to this.
				// CHECKX-SAME: double inreg %xmm2,
				// CHECKX-SAME: <4 x float> inreg %x,
				// CHECKX-SAME: <4 x float> inreg %y,
				// CHECKX-SAME: <4 x float> inreg %z,
				// CHECKX-SAME: <4 x float>* inreg %0,
				// CHECKX-SAME: i32 inreg %edx,
				// CHECKX-SAME: <{ <4 x float>, %struct.NonTrivial }> inalloca %1)

This is an archive of the discontinued LLVM Phabricator instance.

[MS] Overhaul how clang passes overaligned args on x86_32ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 240041

clang/include/clang/CodeGen/CGFunctionInfo.h

clang/lib/CodeGen/CGCall.cpp

clang/lib/CodeGen/TargetInfo.cpp

clang/test/CodeGen/x86_32-arguments-win32.c

clang/test/CodeGenCXX/inalloca-overaligned.cpp

clang/test/CodeGenCXX/inalloca-vector.cpp

[MS] Overhaul how clang passes overaligned args on x86_32
ClosedPublic