Related llvm patch: D60348.
Patch co-authored by Sanjin Sijaric.
Details
- Reviewers
rnk efriedma TomTan ssijaric ostannard - Commits
- rZORGd9f52f48dc51: [COFF, ARM64] Fix ABI implementation of struct returns
rZORG357b29cc2cf8: [COFF, ARM64] Fix ABI implementation of struct returns
rGd9f52f48dc51: [COFF, ARM64] Fix ABI implementation of struct returns
rG357b29cc2cf8: [COFF, ARM64] Fix ABI implementation of struct returns
rG85a0f8fe6c5c: [COFF, ARM64] Fix ABI implementation of struct returns
rL359932: [COFF, ARM64] Fix ABI implementation of struct returns
rC359932: [COFF, ARM64] Fix ABI implementation of struct returns
Diff Detail
Event Timeline
Got rid of the confusing SuppressSRet logic. The "inreg" attribute is now used to indicate sret returns in X0.
The document you linked in the LLVM change (https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values) says that small POD types are returned directly in X0 or X0 and X1, but this looks like it will always return them indirectly. I think we also need to check the size of the type, and fall back the the plain C ABI for small types.
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1060 | Nit: this will use X1 for functions with a this parameter, not X0. | |
1062 | This should also check aarch64_be. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1062 | aarch64_be-windows isn't a thing; it doesn't make sense to check for that explicitly. That said, there is an isAArch64() helper which is probably more readable anyway. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1058 | Microsoft have updated the spec since this was written, it now says to check for aggregate-ness, trivial copy, and trivial destruct, instead of POD. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1058 | This is only true for aggregates. We can have non-aggregate return types that are less or equal to 16 bytes. In this case, they are returned on the stack. | |
1075 | The size returned is in bits, not bytes. As mentioned above, this applies to aggregates with trivial copy assignment operators and destructors. I will provide a function to check for this. At that point, the check can be removed from here, and the check below can be replaced with something like: bool isIndirectReturn = !canReturnInRegisters(); |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1077 | Based on https://reviews.llvm.org/D60348 - lib/Target/AArch64/AArch64CallingConvention.td (line 44), should this be (isIndirectReturn || isInstanceMethod)? |
lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
5906 ↗ | (On Diff #196109) | The isAggregate predicate is not appropriate to check here; it varies depending on whether the language version is C++14 or C++17. For example, C++17 aggregates are allowed to have base classes. You have to split out checking the relevant properties (getNumBases(), isDynamicClass(), etc.). |
lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
5904 ↗ | (On Diff #196144) | I'm not entirely sure it makes sense to do all of these Windows-specific checks here... I think it makes trivial_abi not work the way it should. But that's not something we need to worry about for the initial patch, I think. |
5916 ↗ | (On Diff #196144) | D->getAccess() is the access specifier for the class itself (if the class is nested inside another class), not the access specifier for any of its members. I think you're looking for HasProtectedFields and HasPrivateFields. (I think there currently isn't any getter on CXXRecordDecl for them at the moment, but you can add one.) |
5922 ↗ | (On Diff #196144) | Why are you checking needsImplicitCopyAssignment()? |
lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
5908 ↗ | (On Diff #196144) | I think these checks are in the wrong place - the aggregate restriction applies to return values, but it doesn't apply to arguments. Placing this logic here breaks argument passing between MSVC and Clang, because Clang (with this logic in place) will pass and expect arguments with an extra layer of indirection: for example, when passing https://cs.chromium.org/chromium/src/v8/include/v8.h?q=v8::Local&sq=package:chromium&g=0&l=183 v8::Local, what you actually see in the register bank is supposed to be its only private member. Moving the logic to MicrosoftCXXABI and applying it only to the return value resolves this problem. |
5916 ↗ | (On Diff #196144) | So the issue that I mentioned in https://bugs.llvm.org/show_bug.cgi?id=41135#c18 is resolved by checking HasProtectedFields/HasPrivateFields. |
It looks like there's some missing documentation in the ARM64 ABI document involving homogeneous aggregates... in particular, it looks like non-POD types are never homogeneous, or something along those lines. I guess we can address that in a followup, though.
@TomTan could you look into updating the ARM64 ABI documentation?
Testcase:
struct Pod { double b[2]; }; struct NotAggregate { NotAggregate(); double b[2]; }; struct NotPod { NotAggregate x; }; Pod copy(Pod *x) { return *x; } // ldp d0,d1,[x0] NotAggregate copy(NotAggregate *x) { return *x; } // stp x8,x9,[x0] NotPod copy(NotPod *x) { return *x; } // ldp x0,x1,[x8]
lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
5956 ↗ | (On Diff #196342) | just did some quick tests; I think this rule does in fact apply to AArch64, except that the limit is 128 bits, instead of 64 bits. |
@efriedma from your test sample above, which part is missing in the ARM64 ABI document? For NotPod, it is aggregate which is specific in the document.
For NotPod, it is aggregate which is specific in the document
Yes, it's an aggregate which is returned in registers... but it's returned in integer registers, unlike Pod which is returned in floating-point registers.
struct Pod is HFA (Homogenous Floating-Point Aggregates) which is returned in floating-point register, struct NotPod is not HFA so still returned in integer registers. The ARM64 ABI document will be updated to reflect this, thanks.
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1070 | Should this function also check for user-provided constructors? |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1070 | I think it should: I speculatively added these two lines if (RD->hasUserDeclaredConstructor()) return true; and it resolved the problem with std::setw I mentioned in the bug tracker (which means Electron could start). |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1070 | The comment says it's supposed to check for user-provided constructors, but there isn't any code to perform that check, yes. hasUserDeclaredConstructor() isn't precisely correct; from the C++ standard, "A function is user-provided if it is user-declared and not explicitly defaulted or deleted on its first declaration." |
@efriedma Return info for HFA and HVA is updated in https://github.com/MicrosoftDocs/cpp-docs/pull/970.
Return info for HFA and HVA is updated
That's helpful, but not really sufficient; according to the AAPCS rules, both "Pod" and "NotPod" from my testcase are HFAs.
Could you provide some more details on why NotPod is HFA, according to AAPCS?
In general, computing whether a composite type is a "homogeneous aggregate" according to section 4.3.5 depends only on the "fundamental data types". It looks through "aggregates" (C structs/C++ classes), unions, and arrays. The test applies "without regard to access control or other source language restrictions". Section 7.1.6 mentions there are additional rules that apply to non-POD structs, but that's just referring to the Itanium C++ ABI rule that types which are "non-trivial for the purposes of calls" are passed and returned indirectly. Both clang and gcc agree that "NotPod" is an HFA on AArch64 Linux.
So the question is, what rule is MSVC using to exclude "NotPod" from being an HFA?
Confirmed just now that the current diff (196774) does work, but it'd be good to correct user-provided constructors and trivial destructors before this lands.
The current diff (196894) seems to have the same std::setw issue as the previous one. Here's what it's trying to compile:
_MRTIMP2 _Smanip<streamsize> __cdecl setw(streamsize wide) { // manipulator to set width return (_Smanip<streamsize>(&swfun, wide)); }
Here's the definition for _Smanip:
template<class _Arg> struct _Smanip { // store function pointer and argument value _Smanip(void (__cdecl *_Left)(ios_base&, _Arg), _Arg _Val) : _Pfun(_Left), _Manarg(_Val) { // construct from function pointer and argument value } void (__cdecl *_Pfun)(ios_base&, _Arg); // the function pointer _Arg _Manarg; // the argument value };
Apologies, I meant to write "here's what Clang is trying to call" in the previous comment. It's clear from looking at MSVC's output that MSVC expects to return indirectly via X0, implying that _Smanip is not aggregate by MSVC's definition.
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1076 | So I think that the problem with this new check is that it doesn't check all of the constructors. I replaced it with for (auto it = RD->ctor_begin(); it != RD->ctor_end(); ++it) { if (it->isUserProvided()) return true; } And that seems to resolve the setw problem. |
I'd like to see a few more tests to cover the interesting cases that came up during development of this patch:
- The user-provided constructor issue: there should be testcases with an explicit user-provided constructor, a non-trivial constructor that's explicitly defaulted, and a non-trivial constructor that's implicitly defaulted.
- The difference between a user-defined and a non-trivial destructor: there should be a testcase with an implicit non-trivial destructor (which is non-trivial because a member has a non-trivial destructor).
- A testcase with a trivial copy constructor, but a non-trivial copy-assignment operator.
- A testcase with a struct with a base class.
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1078 | I'm not sure what the IsSizeGreaterThan128 check is doing here - if the return type is over 128 bits, then it will be indirectly returned in X8 with this check, which is not always what we want (e.g. in https://bugs.llvm.org/show_bug.cgi?id=41135 ostream.cpp, MSVC expects the value returned in X1). Another check of hasMicrosoftABIRestrictions might be OK, but that's also not quite right due to the size check. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1078 | I think the check here is intended to counteract the similar check in hasMicrosoftABIRestrictions... but yes, I don't think that works correctly. Looks like we're missing a testcase for a non-aggregate type with size greater than 128 bits. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1078 | Ah, OK. I think the problem is that we need to check whether the return value meets the rest of aggregate definition (i.e. !hasMicrosoftABIRestrictions), as well as being more than 128 bits in size. May be worth splitting the logic up. Will experiment. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1078 | I think you might get the right behavior by just erasing both IsSizeGreaterThan128 checks. That matches the way the ABI document describes the rules. Haven't tried it, though. |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
55–56 | These are both trivial wrappers for RD->canPassInRegisters() which has the real logic. I would just call that to remove some indirection, even if the name is the logical opposite of the property we're trying to check. | |
1075 | Can this be just: for (const CXXConstructorDecl *Ctor : RD->ctors())? |
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1078 | So I tried that, and everything mostly worked, but there are still some crashes in electron. I think the solution will be something like:
If both of those things are are true (i.e. it's an aggregate more than >128bits) we should end up calling FI.getReturnInfo().setInReg(false), which implies a return in x8. I haven't confirmed that this works yet because I've run out of time today, but the result should be in tomorrow. May need similar logic on the outer if. |
And with those modifications, everything seems to work correctly. I'd be fine with the LLVM portion landing at this stage, but one more rev and another test cycle for this patch would be ideal.
lib/CodeGen/MicrosoftCXXABI.cpp | ||
---|---|---|
1070 | I also had to amend this to: bool isIndirectReturn = isAArch64 ? (passClassIndirect(RD) || hasMicrosoftABIRestrictions(RD) || IsSizeGreaterThan128(RD)) : !RD->isPOD(); | |
1078 | ... and also amend this to FI.getReturnInfo().setInReg(isAArch64 && !(isAggregate && IsSizeGreaterThan128(RD))); | |
1081 | So... to get the latest diff working, I had to remove this check... |
Thanks Eli. @richard.townsend.arm Can you please confirm whether Electron works with this patch and D60348?
Just completed testing... everything seems to be working correctly. So long as the all the tests pass, let's commit! :D
These are both trivial wrappers for RD->canPassInRegisters() which has the real logic. I would just call that to remove some indirection, even if the name is the logical opposite of the property we're trying to check.