This is an archive of the discontinued LLVM Phabricator instance.

[X86][vectorcall] Make floating-type passed by value to match with MSVC
Needs ReviewPublic

Authored by pengfei on Sep 28 2022, 1:50 AM.

Details

Reviewers
rnk
Summary

The passing format of floating-point types are different from vector when SSE registers exhausted.
They are passed indirectly by value rather than address. https://godbolt.org/z/Kbs89f36P

Diff Detail

Event Timeline

pengfei created this revision.Sep 28 2022, 1:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM
pengfei requested review of this revision.Sep 28 2022, 1:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:50 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
rnk added inline comments.Sep 28 2022, 11:29 AM
clang/lib/CodeGen/TargetInfo.cpp
1858–1860

I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about:

  • double: direct
  • vector: indirect for alignment
  • aggregate: indirect for alignment, any HFA will presumably be aligned to more than 32bits
clang/test/CodeGen/vectorcall.c
157

Why not pass the double directly? That should be ABI compatible:
https://gcc.godbolt.org/z/W4rjn63b5

pengfei updated this revision to Diff 464892.Oct 3 2022, 11:44 PM

Add HFA test.

clang/lib/CodeGen/TargetInfo.cpp
1858–1860

Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it.
MSVC passes it indirectly too. https://gcc.godbolt.org/z/3qf4dTYfv

clang/test/CodeGen/vectorcall.c
157

Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked vectorcall_indirect_vec, which I think is intended to check if the type, inreg and byval etc. are generated correctly.

rnk added inline comments.Oct 4 2022, 9:49 AM
clang/lib/CodeGen/TargetInfo.cpp
1858–1860

Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing double in HFA2 with __int64, and see that HFA2 is passed underaligned on the stack:
https://gcc.godbolt.org/z/jqx4xcnjq

I still think this code would benefit from separating the regcall and vectorcall cases, something like:

bool IsInReg = State.FreeSSERegs >= NumElts;
if (IsInReg)
  State.FreeSSERegs -= NumElts;
if (IsRegCall) {
  // handle regcall
  if (IsInReg)
     ...
} else {
  // handle vectorcall
  if (IsInReg)
    ...
}

They seem to have pretty different rules both when SSE regs are available and when not.

clang/test/CodeGen/vectorcall.c
157

I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize:

double @byval(double* byval(double) %p) {
  %v = load double, double* %p
  ret double %v
}
double @direct(double %v) {
  ret double %v
}

https://gcc.godbolt.org/z/MjEvdEKbT

Clang should generate the prototype.

pengfei updated this revision to Diff 465391.Oct 5 2022, 7:31 AM

Address @rnk's comments. Thanks!

clang/lib/CodeGen/TargetInfo.cpp
1858–1860

I can see the reason is non plain FP HFAs are not HomogeneousAggregate (possible be passed by SSE) on X86.
Anyway, I got your point now. Refactor the code seems better, thanks!

clang/test/CodeGen/vectorcall.c
157

I see your point now, sounds good to me, thanks!