This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use correct calling convention for each vararg
ClosedPublic

Authored by lephilousophe on Feb 26 2022, 1:13 PM.

Details

Summary

While checking is tail call optimization is possible, the calling
convention applied to fixed arguments is not the correct one.
This implies for DarwinPCS that all arguments of a vararg function will
go to the stack although fixed ones can go in registers.

This prevents non-virtual thunks to be tail optimized although they are marked
as musttail.

Diff Detail

Event Timeline

lephilousophe created this revision.Feb 26 2022, 1:13 PM
lephilousophe requested review of this revision.Feb 26 2022, 1:13 PM

No objection wrt the windows code (as the tests still are passing), but I'd rather have someone more familiar with this code give an authoritative review.

Thank you for the rerouting.

While being at it, I had to duplicate the code twice in the same function and I wonder if:

  • it could be reworked to avoid this
  • the first test is really useful, it seems contradictory with the second one (if all arguments are in registers, why the stack space should matter?)
rnk added a comment.Feb 28 2022, 10:04 AM

I think AArch64 maintainers ought to reconsider the need to run TCO eligibility checks when musttail is set for the call. We have this code in the X86 backend before checking eligibility (file is too large for github to link to and show):

if (isTailCall && !IsMustTail) {
  // Check if it's really possible to do a tail call.
  isTailCall = IsEligibleForTailCallOptimization(

In AArch64, we have this code:

if (!IsTailCall && CLI.CB && CLI.CB->isMustTailCall())
  report_fatal_error("failed to perform tail call elimination on a call "
                     "site marked musttail");

Basically, on x86, musttail bypasses the majority of the checking. The verifier requirements for musttail are pretty strict: the prototypes must match, including ABI attributes, which is usually enough for the backend to actually implement TCO. This patch isn't making any codegen changes here, so clearly the backend can already handle this case.

The alternative is that we have to add a continuing number of special cases like the ones being added in this code.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5993

This seems like a bug for musttail. We could set up a virtual thunk with a byval argument, and I believe that would result in the relevant backend error.

6002

Seems like another musttail bug for similar reasons.

6020

This is an interesting check. If the user sets up a musttail call to an extern weak function, it's not clear what should happen. A codegen error seems reasonable.

6075

Suppose we had a musttail call passing such SVE arguments. Do we still need this legality check? If the prototypes match, won't the size of the allocated bytes be the same? For a given program, all SVE values have the same size at runtime, right? (forgive my ignorance)

Assuming yes, this represents another check that we would want to bypass for musttail calls. You can create a vtable thunk function for a method that passes SVE values to try to exercise this corner case.

6093

Similarly, the ABI attribute checks in the verifier for musttail calls should imply that this check will always pass, we will never fail to TCO a musttail call for this reason, right?

rnk added inline comments.Feb 28 2022, 12:07 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5993

So far I have failed to convince clang to use the byval attribute when generating code for AArch64, so maybe this is irrelevant.

This patch isn't making any codegen changes here, so clearly the backend can already handle this case.

In fact I just copied the piece of code from the AArch64TargetLowering::LowerCall function.
By doing this I believe I just make things consistent between whether the TCO could take place and when TCO is done.

Now a noob question related to your comments.
I read that there is now a clang attribute to mark calls as musttail. Isn't these checks necessary for the case where user can ask for a tail call when it's not possible?

rnk added a comment.Feb 28 2022, 3:41 PM

Now a noob question related to your comments.

It's a reasonable question. :)

I read that there is now a clang attribute to mark calls as musttail. Isn't these checks necessary for the case where user can ask for a tail call when it's not possible?

The frontend has a set of safety checks for the musttail attribute. If those pass, the IR verifier checks should also pass, meaning the IR prototypes should match. If the IR prototypes match, the backend should be able to handle it. If the fatal error I quoted earlier ever executes, that's a bug. I think we can fix more bugs than we introduce by removing that fatal error. There may be some constructs that AArch64 tail call codegen can't handle yet, but we need to fix them anyway.

Is there something I could do to address all your comments?
Should I proceed, add the musttail check in LowerCall and remove the superfluous checks in IsEligibleForTailCallOptimization?

Is there other reviewers to add?

rnk added a subscriber: efriedma.Mar 1 2022, 1:29 PM

Ideally, I'd like to hear from @t.p.northover , since I think he added AArch64 musttail support, if I'm not mistaken. Maybe @efriedma can give input, I think we've discussed this feature and it's limitations before.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 1:29 PM

In general, if we need to skip certain checks for musttail calls, I'd prefer to do that explicitly, check-by-check. Even for cases that "should" work, they require extra implementation work in some cases.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5993

This is a missing feature. And yes, I don't think clang uses byval on AArch64 in practice.

6002

In general, we can't musttail in this case: x0 wouldn't be set correctly. From a C++ source code perspective, this should work because the caller and callee should use the same sret pointer value, but the IR verifier won't check that? Maybe it should?

For a practical example where this shows up, try the following targeting aarch64-pc-windows-msvc:

struct A { A(const A&); };
A f();
A g() { [[clang::musttail]] return f(); }
6020

We might be able to turn it into an indirect call...? Not sure.

6032–6033

Can you refactor this loop into a separate helper function, instead of copy-pasting it?

I think we need additional handling for musttail if there are more fixed arguments than fit in registers, but I guess we can leave that for a followup.

6075

For a musttail call, we should be able to reuse stack from the caller, but I think we'd need to explicitly implement that: we'd need to mess with the code that would normally allocate in the current stack frame.

6093

This naturally shouldn't fail for musttail calls, I think.

llvm/test/CodeGen/AArch64/darwinpcs-musttail.ll
17 ↗(On Diff #411639)

Can you add a plain "tail" testcase in addition to the "musttail" testcase, since this patch affects the behavior of that too?

rnk added inline comments.Mar 2 2022, 2:18 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5993

I came across this thread in my search, suggesting that byval wasn't implemented for aarch64 globalisel:
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149031.html

6002

The same issue exists for x86 in RAX, right? I don't think we can write a verifier check for it, since it would make it invalid to obscure the sret pointer by spilling it to memory and reloading it. Obfuscators do this.

It seems like an existing issue that we should document as undefined behavior. If the sret pointer parameter isn't forwarded to a musttail call site, that's UB. It's more of a frontend concern than a user concern.

6020

On reflection, I think this is OK: if the user marks a direct call to an extern_weak function musttail, it's more reasonable to get a linker error or a runtime crash if that function is undefined during the link. There's just no way to honor the musttail marker and get the no-op behavior from the linker.

6032–6033

I think it's likely that no additional support is required for musttail. If we support passing memory arguments to other guaranteed TCO conventions like swifttailcc and tailcc, musttail should follow the same codepaths which overwrite the incoming memory arguments with the outgoing memory arguments before executing the epilogue. Having extra variadic arguments shouldn't matter, because when the prototypes match, it should not require allocating additional argument stack memory.

6055

This is the same ArgLocs computation, I think we could calculate it earlier and save it and reuse it, right?

6075

So, this *is* an interesting check. Without this check, we presumably allocate local stack memory, and pass a pointer to it to the tail call, which effectively creates a use-after-return bug. That's a real blocker.

efriedma added inline comments.Mar 2 2022, 3:01 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6002

Yes, it the same issue as "RAX" on x86. I'm okay with runtime UB if we don't think it's a rule the verifier can reasonably verify. We still need to document the relevant rule in LangRef.

6032–6033

Oh, I see what you mean. Yes, musttail should just work if we skip this check.

Actually, taking another look, I'm not sure why we're special-casing varargs calls here in the first place; the CCInfo.getNextStackOffset() > FuncInfo->getBytesInStackArgArea() check later should catch the problematic cases, I think. Maybe I'm forgetting something.

rnk added a comment.Mar 2 2022, 3:54 PM

I kind of side tracked the review, sorry about that. What's the next step here? Is it worth taking the time to rewrite this logic to refactor it and divide the musttail-relevant checks from the regular checks, or should we go forward with improving this patch?

I think I don't like the complexity in this patch. Ideally, I would like to declare non-musttail vararg tail calls to be unsupported: we shouldn't have to forward variadic arguments between functions with mismatched prototypes. To me, that's what's adding a lot of complexity in the eligibility checks here. One of the major use cases for the musttail feature is to make it *possible * to forward variadic argument packs because we don't have to worry about moving around arguments that live in memory.

I kind of side tracked the review, sorry about that. What's the next step here? Is it worth taking the time to rewrite this logic to refactor it and divide the musttail-relevant checks from the regular checks, or should we go forward with improving this patch?

This patch looks more complicated than it actually is because it's copy-pasting code. The new code is doing exactly the same thing as the old code, except it's using the target-specific form of AnalyzeCallOperands.

Like I said before, I'd rather keep the codepaths unified if possible. If we need to skip some specific check for musttail, we can do that explicitly. There are multiple cases where musttail actually doesn't work, and a compiler crash in those cases seems better than silently miscompiling code.

I think I don't like the complexity in this patch. Ideally, I would like to declare non-musttail vararg tail calls to be unsupported: we shouldn't have to forward variadic arguments between functions with mismatched prototypes.

Forwarding variadic arguments isn't a thing outside of musttail. For non-musttail calls, whether the call is variadic doesn't really change the necessary checks.

I factorized the code to have a unique function in charge to calculate the arguments locations.

lephilousophe added inline comments.Mar 4 2022, 6:23 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6032–6033

I refactored the code in a separate function and merged in it varargs and non varargs case.
I don't think it would change the behaviour but it seems more correct to me that the checks in isEligibleForTailCallOptimization and in LowerCall are the same.

6055

I did it and tests pass.
By doing this, I relocated both checks at the same place it seemed to me it wasn't changing anything.

llvm/test/CodeGen/AArch64/darwinpcs-musttail.ll
17 ↗(On Diff #411639)

I don't see how ot do this test as tail calls don't support ellipses at the end to call back varargs.

Update to please clang-format

efriedma added inline comments.Mar 4 2022, 9:46 AM
llvm/test/CodeGen/AArch64/darwinpcs-musttail.ll
17 ↗(On Diff #411639)

tail call void (%class.C*, i8*, ...) @_ZN1C3addEPKcz(%class.C* noundef nonnull align 8 dereferenceable(28) undef, i8* noundef %1) should trigger the relevant codepath, I think?

Added a tail test case

lephilousophe added inline comments.Mar 4 2022, 1:35 PM
llvm/test/CodeGen/AArch64/darwinpcs-musttail.ll
17 ↗(On Diff #411639)

Indeed. Thanks!
I renamed the file name in the same time.

clang-format fixes again

@efriedma is there something else to do to get this merged?
Thanks.

efriedma accepted this revision.Mar 8 2022, 11:54 AM

LGTM. Sorry about the delay.

@lephilousophe I assume you want me to commit this for you; how should I credit you in the "author" line of the git commit?

@rnk any last comments before I merged?

This revision is now accepted and ready to land.Mar 8 2022, 11:54 AM

LGTM. Sorry about the delay.

@lephilousophe I assume you want me to commit this for you; how should I credit you in the "author" line of the git commit?

You can use Philippe Valembois <lephilousophe@users.noreply.github.com>.

rnk added a comment.Mar 8 2022, 4:19 PM

Thanks, this looks pretty good, but of course I dug into some edge cases.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5909

Does this need to be a class method? Can it be a static helper with a TLI parameter?

5910–5911

CLI contains isVararg, CalleeCC, and Outs, I believe

6068

We should skip this check for musttaill calls (CLI.CB && CLI.CB->isMustTailCall())

6077

I think we'll still come here and have the same bug for variadic functions with 9 parameters when one of the fixed params is passed in memory, consider:
https://gcc.godbolt.org/z/Pb5EoK1d1

struct A { virtual void f(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7, int mem, ...); };
struct B { virtual void f(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7, int mem, ...); };
struct C : A, B {
    virtual void key();
    virtual void f(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7, int mem, ...);
};
6189–6190

If you pass CLI here, the remaining parameters are redundant. All the parameters Outs, OutVals, Ins, CallConv, IsVarArg are members of CLI.

Update to have a static function.
A test is added so musttail can be used even when fixed arguments are in stack.

@rnk I think I addressed all your comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5910–5911

Indeed.
This is now using CLI.
Subtarget is private in TLI so I pass it as an argument.

6068

Done.

6077

I added your test case to the regression tests.

6189–6190

Indeed. This is fixed.
Except that CallConnv was modified in the function and CLI didn't get updated.
I used a reference instead and CLI gets updated.
Tests are passing so I expect this change doesn't break anything.

rnk accepted this revision.Mar 10 2022, 10:56 AM

lgtm, thanks!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6069

I strongly suspect it would be acceptable to simplify this to if (!Outs.empty()) return false;, but I don't want to insist.

llvm/test/CodeGen/AArch64/darwinpcs-tail.ll
35

Thanks for testing it.

This revision was landed with ongoing or failed builds.Mar 10 2022, 3:10 PM
This revision was automatically updated to reflect the committed changes.