This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
2/4
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
2/4
vectorcall.c

Differential D134797

[X86][vectorcall] Make floating-type passed by value to match with MSVC
Needs ReviewPublic

Authored by pengfei on Sep 28 2022, 1:50 AM.

Download Raw Diff

Details

Reviewers

rnk

Summary

The passing format of floating-point types are different from vector when SSE registers exhausted.
They are passed indirectly by value rather than address. https://godbolt.org/z/Kbs89f36P

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pengfei created this revision.Sep 28 2022, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM

pengfei requested review of this revision.Sep 28 2022, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B189113: Diff 463460.Sep 28 2022, 2:19 AM

rnk added inline comments.Sep 28 2022, 11:29 AM

clang/lib/CodeGen/TargetInfo.cpp
1858–1860	I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about: double: direct vector: indirect for alignment aggregate: indirect for alignment, any HFA will presumably be aligned to more than 32bits
clang/test/CodeGen/vectorcall.c
157	Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.org/z/W4rjn63b5

Add HFA test.

clang/lib/CodeGen/TargetInfo.cpp
1858–1860	Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it. MSVC passes it indirectly too. https://gcc.godbolt.org/z/3qf4dTYfv
clang/test/CodeGen/vectorcall.c
157	Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked `vectorcall_indirect_vec`, which I think is intended to check if the type, `inreg` and `byval` etc. are generated correctly.

Harbormaster completed remote builds in B190121: Diff 464892.Oct 4 2022, 12:55 AM

rnk added inline comments.Oct 4 2022, 9:49 AM

clang/lib/CodeGen/TargetInfo.cpp
1858–1860	Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing `double` in HFA2 with `__int64`, and see that HFA2 is passed underaligned on the stack: https://gcc.godbolt.org/z/jqx4xcnjq I still think this code would benefit from separating the regcall and vectorcall cases, something like: bool IsInReg = State.FreeSSERegs >= NumElts; if (IsInReg) State.FreeSSERegs -= NumElts; if (IsRegCall) { // handle regcall if (IsInReg) ... } else { // handle vectorcall if (IsInReg) ... } They seem to have pretty different rules both when SSE regs are available and when not.
clang/test/CodeGen/vectorcall.c
157	I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize: double @byval(double* byval(double) %p) { %v = load double, double* %p ret double %v } double @direct(double %v) { ret double %v } https://gcc.godbolt.org/z/MjEvdEKbT Clang should generate the prototype.

Address @rnk's comments. Thanks!

clang/lib/CodeGen/TargetInfo.cpp
1858–1860	I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by SSE) on X86. Anyway, I got your point now. Refactor the code seems better, thanks!
clang/test/CodeGen/vectorcall.c
157	I see your point now, sounds good to me, thanks!

Harbormaster completed remote builds in B190482: Diff 465391.Oct 5 2022, 8:52 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

TargetInfo.cpp

3 lines

test/

CodeGen/

vectorcall.c

29 lines

Diff 464892

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,849 Lines • ▼ Show 20 Lines	if (State.FreeSSERegs >= NumElts) {
// does.		// does.
if (IsVectorCall)		if (IsVectorCall)
return getDirectX86Hva();		return getDirectX86Hva();

if (Ty->isBuiltinType() \|\| Ty->isVectorType())		if (Ty->isBuiltinType() \|\| Ty->isVectorType())
return ABIArgInfo::getDirect();		return ABIArgInfo::getDirect();
return ABIArgInfo::getExpand();		return ABIArgInfo::getExpand();
}		}
return getIndirectResult(Ty, /ByVal=/false, State);		bool ByVal = IsVectorCall && Ty->isFloatingType();
		return getIndirectResult(Ty, ByVal, State);
}		}
		rnkUnsubmitted Not Done Reply Inline Actions I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about: double: direct vector: indirect for alignment aggregate: indirect for alignment, any HFA will presumably be aligned to more than 32bits rnk: I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it. MSVC passes it indirectly too. https://gcc.godbolt.org/z/3qf4dTYfv pengfei: Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected.
		rnkUnsubmitted Not Done Reply Inline Actions Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing `double` in HFA2 with `__int64`, and see that HFA2 is passed underaligned on the stack: https://gcc.godbolt.org/z/jqx4xcnjq I still think this code would benefit from separating the regcall and vectorcall cases, something like: bool IsInReg = State.FreeSSERegs >= NumElts; if (IsInReg) State.FreeSSERegs -= NumElts; if (IsRegCall) { // handle regcall if (IsInReg) ... } else { // handle vectorcall if (IsInReg) ... } They seem to have pretty different rules both when SSE regs are available and when not. rnk: Thanks, I see what you mean. I thought the code for handling overaligned aggregates would…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by SSE) on X86. Anyway, I got your point now. Refactor the code seems better, thanks! pengfei: I can see the reason is non plain FP HFAs are not `HomogeneousAggregate` (possible be passed by…

if (isAggregateTypeForABI(Ty)) {		if (isAggregateTypeForABI(Ty)) {
// Structures with flexible arrays are always indirect.		// Structures with flexible arrays are always indirect.
// FIXME: This should not be byval!		// FIXME: This should not be byval!
if (RT && RT->getDecl()->hasFlexibleArrayMember())		if (RT && RT->getDecl()->hasFlexibleArrayMember())
return getIndirectResult(Ty, true, State);		return getIndirectResult(Ty, true, State);

// Ignore empty structs/unions on non-Windows.		// Ignore empty structs/unions on non-Windows.
▲ Show 20 Lines • Show All 10,510 Lines • Show Last 20 Lines

clang/test/CodeGen/vectorcall.c

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	// X86-SAME: double inreg noundef %xmm1,			// X86-SAME: double inreg noundef %xmm1,
	// X86-SAME: double inreg noundef %xmm2,			// X86-SAME: double inreg noundef %xmm2,
	// X86-SAME: double inreg noundef %xmm3,			// X86-SAME: double inreg noundef %xmm3,
	// X86-SAME: double inreg noundef %xmm4,			// X86-SAME: double inreg noundef %xmm4,
	// X86-SAME: <4 x float> inreg noundef %xmm5,			// X86-SAME: <4 x float> inreg noundef %xmm5,
	// X86-SAME: <4 x float>* inreg noundef %0,			// X86-SAME: <4 x float>* inreg noundef %0,
	// X86-SAME: i32 inreg noundef %edx,			// X86-SAME: i32 inreg noundef %edx,
	// X86-SAME: <4 x float>* noundef %1)			// X86-SAME: <4 x float>* noundef %1)

				// The passing format of floating-point types are different from vector when SSE registers exhausted.
				// They are passed indirectly by value rather than address.
				void __vectorcall vectorcall_indirect_fp(
				double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
				v4f32 xmm5, double stack) {
				}
				// X86: define dso_local x86_vectorcallcc void @"\01vectorcall_indirect_fp@@{{[0-9]+}}"
				// X86-SAME: (double inreg noundef %xmm0,
				// X86-SAME: double inreg noundef %xmm1,
				// X86-SAME: double inreg noundef %xmm2,
				// X86-SAME: double inreg noundef %xmm3,
				// X86-SAME: double inreg noundef %xmm4,
				// X86-SAME: <4 x float> inreg noundef %xmm5,
				// X86-SAME: double* noundef byval(double) align 4 %0)
				rnkUnsubmitted Not Done Reply Inline Actions Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.org/z/W4rjn63b5 rnk: Why not pass the double directly? That should be ABI compatible: https://gcc.godbolt.
				pengfeiAuthorUnsubmitted Done Reply Inline Actions Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked `vectorcall_indirect_vec`, which I think is intended to check if the type, `inreg` and `byval` etc. are generated correctly. pengfei: Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test…
				rnkUnsubmitted Not Done Reply Inline Actions I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize: double @byval(double* byval(double) %p) { %v = load double, double* %p ret double %v } double @direct(double %v) { ret double %v } https://gcc.godbolt.org/z/MjEvdEKbT Clang should generate the prototype. rnk: I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the…
				pengfeiAuthorUnsubmitted Done Reply Inline Actions I see your point now, sounds good to me, thanks! pengfei: I see your point now, sounds good to me, thanks!

				// Make sure HFA is passed indirectly by address.
				void __vectorcall vectorcall_indirect_hfa(
				double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
				v4f32 xmm5, struct HFA2 hfa2) {
				}
				// X86: define dso_local x86_vectorcallcc void @"\01vectorcall_indirect_hfa@@{{[0-9]+}}"
				// X86-SAME: (double inreg noundef %xmm0,
				// X86-SAME: double inreg noundef %xmm1,
				// X86-SAME: double inreg noundef %xmm2,
				// X86-SAME: double inreg noundef %xmm3,
				// X86-SAME: double inreg noundef %xmm4,
				// X86-SAME: <4 x float> inreg noundef %xmm5,
				// X86-SAME: %struct.HFA2* inreg noundef %hfa2)
	#endif			#endif