This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
2/4
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
2/4
vectorcall.c

Differential D134797

[X86][vectorcall] Make floating-type passed by value to match with MSVC
Needs ReviewPublic

Authored by pengfei on Sep 28 2022, 1:50 AM.

Download Raw Diff

Details

Reviewers

rnk

Summary

The passing format of floating-point types are different from vector when SSE registers exhausted.
They are passed indirectly by value rather than address. https://godbolt.org/z/Kbs89f36P

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pengfei created this revision.Sep 28 2022, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM

pengfei requested review of this revision.Sep 28 2022, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B189113: Diff 463460.Sep 28 2022, 2:19 AM

rnk added inline comments.Sep 28 2022, 11:29 AM

clang/lib/CodeGen/TargetInfo.cpp
1868–1869	I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about: double: direct vector: indirect for alignment aggregate: indirect for alignment, any HFA will presumably be aligned to more than 32bits
clang/test/CodeGen/vectorcall.c
157	Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.org/z/W4rjn63b5

Add HFA test.

clang/lib/CodeGen/TargetInfo.cpp
1868–1869	Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it. MSVC passes it indirectly too. https://gcc.godbolt.org/z/3qf4dTYfv
clang/test/CodeGen/vectorcall.c
157	Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked `vectorcall_indirect_vec`, which I think is intended to check if the type, `inreg` and `byval` etc. are generated correctly.

Harbormaster completed remote builds in B190121: Diff 464892.Oct 4 2022, 12:55 AM

rnk added inline comments.Oct 4 2022, 9:49 AM

clang/lib/CodeGen/TargetInfo.cpp
1868–1869	Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing `double` in HFA2 with `__int64`, and see that HFA2 is passed underaligned on the stack: https://gcc.godbolt.org/z/jqx4xcnjq I still think this code would benefit from separating the regcall and vectorcall cases, something like: bool IsInReg = State.FreeSSERegs >= NumElts; if (IsInReg) State.FreeSSERegs -= NumElts; if (IsRegCall) { // handle regcall if (IsInReg) ... } else { // handle vectorcall if (IsInReg) ... } They seem to have pretty different rules both when SSE regs are available and when not.
clang/test/CodeGen/vectorcall.c
157	I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize: double @byval(double* byval(double) %p) { %v = load double, double* %p ret double %v } double @direct(double %v) { ret double %v } https://gcc.godbolt.org/z/MjEvdEKbT Clang should generate the prototype.

Address @rnk's comments. Thanks!

clang/lib/CodeGen/TargetInfo.cpp
1868–1869	I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by SSE) on X86. Anyway, I got your point now. Refactor the code seems better, thanks!
clang/test/CodeGen/vectorcall.c
157	I see your point now, sounds good to me, thanks!

Harbormaster completed remote builds in B190482: Diff 465391.Oct 5 2022, 8:52 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

46 lines

test/

CodeGen/

vectorcall.c

29 lines

Diff 465391

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,831 Lines • ▼ Show 20 Lines	if (RT) {
if (RAA == CGCXXABI::RAA_Indirect) {		if (RAA == CGCXXABI::RAA_Indirect) {
return getIndirectResult(Ty, false, State);		return getIndirectResult(Ty, false, State);
} else if (RAA == CGCXXABI::RAA_DirectInMemory) {		} else if (RAA == CGCXXABI::RAA_DirectInMemory) {
// The field index doesn't matter, we'll fix it up later.		// The field index doesn't matter, we'll fix it up later.
return ABIArgInfo::getInAlloca(/FieldIndex=/0);		return ABIArgInfo::getInAlloca(/FieldIndex=/0);
}		}
}		}

		if (IsRegCall \|\| IsVectorCall) {
// Regcall uses the concept of a homogenous vector aggregate, similar		// Regcall uses the concept of a homogenous vector aggregate, similar
// to other targets.		// to other targets.
const Type *Base = nullptr;		const Type *Base = nullptr;
uint64_t NumElts = 0;		uint64_t NumElts = 0;
if ((IsRegCall \|\| IsVectorCall) &&		bool IsInReg = false;
isHomogeneousAggregate(Ty, Base, NumElts)) {		if (isHomogeneousAggregate(Ty, Base, NumElts)) {
if (State.FreeSSERegs >= NumElts) {		if (State.FreeSSERegs >= NumElts) {
State.FreeSSERegs -= NumElts;		State.FreeSSERegs -= NumElts;
		IsInReg = true;
// Vectorcall passes HVAs directly and does not flatten them, but regcall		}
// does.		if (IsRegCall) {
if (IsVectorCall)		if (IsInReg) {
return getDirectX86Hva();		// Regcall passes HVAs directly and flattens them.

if (Ty->isBuiltinType() \|\| Ty->isVectorType())		if (Ty->isBuiltinType() \|\| Ty->isVectorType())
return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
return ABIArgInfo::getExpand();		return ABIArgInfo::getExpand();
}		}
		} else {
		// Vectorcall passes floating types directly no matter if they can be
		// passed in register or not.
		if (Ty->isFloatingType())
		return ABIArgInfo::getDirect();
		// Vectorcall passes HVAs directly and does not flatten them.
		if (IsInReg)
		return getDirectX86Hva();
		}
return getIndirectResult(Ty, /ByVal=/false, State);		return getIndirectResult(Ty, /ByVal=/false, State);
}		}
		}
		rnkUnsubmitted Not Done Reply Inline Actions I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about: double: direct vector: indirect for alignment aggregate: indirect for alignment, any HFA will presumably be aligned to more than 32bits rnk: I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it. MSVC passes it indirectly too. https://gcc.godbolt.org/z/3qf4dTYfv pengfei: Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected.
		rnkUnsubmitted Not Done Reply Inline Actions Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing `double` in HFA2 with `__int64`, and see that HFA2 is passed underaligned on the stack: https://gcc.godbolt.org/z/jqx4xcnjq I still think this code would benefit from separating the regcall and vectorcall cases, something like: bool IsInReg = State.FreeSSERegs >= NumElts; if (IsInReg) State.FreeSSERegs -= NumElts; if (IsRegCall) { // handle regcall if (IsInReg) ... } else { // handle vectorcall if (IsInReg) ... } They seem to have pretty different rules both when SSE regs are available and when not. rnk: Thanks, I see what you mean. I thought the code for handling overaligned aggregates would…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by SSE) on X86. Anyway, I got your point now. Refactor the code seems better, thanks! pengfei: I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by…

if (isAggregateTypeForABI(Ty)) {		if (isAggregateTypeForABI(Ty)) {
// Structures with flexible arrays are always indirect.		// Structures with flexible arrays are always indirect.
// FIXME: This should not be byval!		// FIXME: This should not be byval!
if (RT && RT->getDecl()->hasFlexibleArrayMember())		if (RT && RT->getDecl()->hasFlexibleArrayMember())
return getIndirectResult(Ty, true, State);		return getIndirectResult(Ty, true, State);

// Ignore empty structs/unions on non-Windows.		// Ignore empty structs/unions on non-Windows.
▲ Show 20 Lines • Show All 10,510 Lines • Show Last 20 Lines

clang/test/CodeGen/vectorcall.c

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	// X86-SAME: double inreg noundef %xmm1,			// X86-SAME: double inreg noundef %xmm1,
	// X86-SAME: double inreg noundef %xmm2,			// X86-SAME: double inreg noundef %xmm2,
	// X86-SAME: double inreg noundef %xmm3,			// X86-SAME: double inreg noundef %xmm3,
	// X86-SAME: double inreg noundef %xmm4,			// X86-SAME: double inreg noundef %xmm4,
	// X86-SAME: <4 x float> inreg noundef %xmm5,			// X86-SAME: <4 x float> inreg noundef %xmm5,
	// X86-SAME: <4 x float>* inreg noundef %0,			// X86-SAME: <4 x float>* inreg noundef %0,
	// X86-SAME: i32 inreg noundef %edx,			// X86-SAME: i32 inreg noundef %edx,
	// X86-SAME: <4 x float>* noundef %1)			// X86-SAME: <4 x float>* noundef %1)

				// The passing format of floating-point types are different from vector when SSE registers exhausted.
				// They are passed indirectly by value rather than address.
				void __vectorcall vectorcall_indirect_fp(
				double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
				v4f32 xmm5, double stack) {
				}
				// X86: define dso_local x86_vectorcallcc void @"\01vectorcall_indirect_fp@@{{[0-9]+}}"
				// X86-SAME: (double inreg noundef %xmm0,
				// X86-SAME: double inreg noundef %xmm1,
				// X86-SAME: double inreg noundef %xmm2,
				// X86-SAME: double inreg noundef %xmm3,
				// X86-SAME: double inreg noundef %xmm4,
				// X86-SAME: <4 x float> inreg noundef %xmm5,
				// X86-SAME: double noundef %stack)
				rnkUnsubmitted Not Done Reply Inline Actions Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.org/z/W4rjn63b5 rnk: Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.
				pengfeiAuthorUnsubmitted Done Reply Inline Actions Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked `vectorcall_indirect_vec`, which I think is intended to check if the type, `inreg` and `byval` etc. are generated correctly. pengfei: Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test…
				rnkUnsubmitted Not Done Reply Inline Actions I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize: double @byval(double* byval(double) %p) { %v = load double, double* %p ret double %v } double @direct(double %v) { ret double %v } https://gcc.godbolt.org/z/MjEvdEKbT Clang should generate the prototype. rnk: I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the…
				pengfeiAuthorUnsubmitted Done Reply Inline Actions I see your point now, sounds good to me, thanks! pengfei: I see your point now, sounds good to me, thanks!

				// Make sure HFA is passed indirectly by address.
				void __vectorcall vectorcall_indirect_hfa(
				double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
				v4f32 xmm5, struct HFA2 hfa2) {
				}
				// X86: define dso_local x86_vectorcallcc void @"\01vectorcall_indirect_hfa@@{{[0-9]+}}"
				// X86-SAME: (double inreg noundef %xmm0,
				// X86-SAME: double inreg noundef %xmm1,
				// X86-SAME: double inreg noundef %xmm2,
				// X86-SAME: double inreg noundef %xmm3,
				// X86-SAME: double inreg noundef %xmm4,
				// X86-SAME: <4 x float> inreg noundef %xmm5,
				// X86-SAME: %struct.HFA2* inreg noundef %hfa2)
	#endif			#endif