The passing format of floating-point types are different from vector when SSE registers exhausted.
They are passed indirectly by value rather than address. https://godbolt.org/z/Kbs89f36P
Details
- Reviewers
rnk
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
1858–1860 | I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about:
| |
clang/test/CodeGen/vectorcall.c | ||
157 | Why not pass the double directly? That should be ABI compatible: |
Add HFA test.
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
1858–1860 | Not sure if I understand it correctly, the HFA is not a floating type, so it won't be affected. Add a test case for it. | |
clang/test/CodeGen/vectorcall.c | ||
157 | Sorry, I'm not sure what's your mean here. Do you mean I should use your example as the test case? Here the case mocked vectorcall_indirect_vec, which I think is intended to check if the type, inreg and byval etc. are generated correctly. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
1858–1860 | Thanks, I see what you mean. I thought the code for handling overaligned aggregates would trigger, passing any HFA indirectly, but it does not for plain FP HFAs. You can observe the difference by replacing double in HFA2 with __int64, and see that HFA2 is passed underaligned on the stack: I still think this code would benefit from separating the regcall and vectorcall cases, something like: bool IsInReg = State.FreeSSERegs >= NumElts; if (IsInReg) State.FreeSSERegs -= NumElts; if (IsRegCall) { // handle regcall if (IsInReg) ... } else { // handle vectorcall if (IsInReg) ... } They seem to have pretty different rules both when SSE regs are available and when not. | |
clang/test/CodeGen/vectorcall.c | ||
157 | I mean that these two LLVM prototypes are ABI compatible at the binary level for i686, but the second is much easier to optimize: double @byval(double* byval(double) %p) { %v = load double, double* %p ret double %v } double @direct(double %v) { ret double %v } https://gcc.godbolt.org/z/MjEvdEKbT Clang should generate the prototype. |
Address @rnk's comments. Thanks!
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
1858–1860 | I can see the reason is non plain FP HFAs are not HomogeneousAggregate (possible be passed by SSE) on X86. | |
clang/test/CodeGen/vectorcall.c | ||
157 | I see your point now, sounds good to me, thanks! |
I would try to refactor this so that the vectorcall HFA that can't be passed in SSE regs falls through to the following logic. I suspect that it correctly handles each case that we care about: