Page MenuHomePhabricator

[COFF, ARM64] Fix ABI implementation of struct returns
ClosedPublic

Authored by mgrang on Apr 5 2019, 5:32 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

mgrang created this revision.Apr 5 2019, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2019, 5:32 PM
mgrang added a comment.Apr 5 2019, 5:39 PM

Refer https://bugs.llvm.org/show_bug.cgi?id=41135 for a detailed discussion on the issue. Please review this and the related clang patch and let me know if this implementation is something we can proceed with.

Note: In case, for instance methods (when "this" is passed in x0), we would need to do a mov from x1 to get the sret parameter. I have not yet implemented this. Will do so once the reviewers +1 the implementation.

mgrang updated this revision to Diff 194616.Apr 10 2019, 5:27 PM

The new patch now also handles struct returns for instance methods.

Hi Mandeep,

I had a quick look through and wrote up some thoughts that triggered immediately.
I'll try to have a further look when I find more time.

Thanks!

lib/Target/AArch64/AArch64ISelLowering.cpp
3212 ↗(On Diff #194616)

As said elsewhere, I don't think the note is that helpful after this gets committed.

3219 ↗(On Diff #194616)

I wonder if the a more exact word could be used than "non-POD"?
The updated documentation at https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values doesn't describe the rules in terms of POD anymore. I wonder if a more exact term could be used here than "non-POD"? I realize that may be hard... Maybe just drop the word "non-POD", as it probably wouldn't remove anything from the clarity of the explanation in the comment?

3223 ↗(On Diff #194616)

I needed to browse around the surrounding code quite a bit to understand what "I == 1" really means.
I guess it means "this is the second argument to the function". If my guess is correct, maybe it'd be useful to spell it out in the comment?

3226 ↗(On Diff #194616)

I wonder if it'd be a good idea to add a pointer in the comment to the relevant MSVC ABI documentation indicating that the struct pointer must be returned in X0?
I think that would be https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values?

4007 ↗(On Diff #194616)

This note is helpful for review, indicating it's based on X86 implementation. However, I don't think it adds too much value if it'd be committed as is to the code base.

lib/Target/AArch64/AArch64MachineFunctionInfo.h
94–98 ↗(On Diff #194616)

It seems this is also present in X86MachineFunctionInfo.
I wonder if it wouldn't be better to move this to the MachineFunctionInfo base class so we don't need to add an identical interface in 2 derived classes?

mgrang updated this revision to Diff 194728.Apr 11 2019, 11:43 AM
mgrang marked 6 inline comments as done.Apr 11 2019, 11:47 AM
mgrang added inline comments.
lib/Target/AArch64/AArch64MachineFunctionInfo.h
94–98 ↗(On Diff #194616)

MachineFunctionInfo base only has a factory function for creating objects. All fields/methods are in derived classes. So not sure if we should move SRetReturnReg to the base.

mgrang updated this revision to Diff 194951.Apr 12 2019, 1:37 PM

Hi Mandeep,

I spent a bit more time understanding the code in more depth now.
I have a few more remarks that I hope will help to make the code simpler and more understandable.
Overall, the design seems acceptable to me - especially as it is in line with how things are done in the X86 backend. It's good that we don't unnecessarily use more different ways to implement things.
One big remark though: I think this needs regression tests to be added that should cover the lines of code you're introducing in this patch.

I expect to be offline for the next 7 days, so if anyone else would want to review/approve an updated version of this patch: I'd also be happy with that if the above comments have been taken into account.

lib/Target/AArch64/AArch64CallingConvention.td
19–23 ↗(On Diff #194951)

Is CCIfNotSubtarget used anywhere? If not, this can be removed?

44–46 ↗(On Diff #194951)

Based on my understanding of this code, I think the following comment would make it easier to understand, by explaining what is aimed for without making the comment only about windows systems:

// In AAPCS, an SRet is passed in X8, not X0 like a normal pointer parameter.
// However, on windows, in some circumstances, the SRet is passed in in X0 or X1 instead.
// The presence of the inreg attribute indicates that SRet is passed in the alternative register (X0 or X1), not X8:
//  - X0 for non-instance methods.
//  - X1 for instance methods.
47–52 ↗(On Diff #194951)

Assuming the proposed extended comment/documentation is correct, I think it would be easier to follow the logic if "CCIfInReg" rather than "CCIfNotInReg" was used to express the logic (one negation less). Do you think that would be possible?

47–52 ↗(On Diff #194951)

It seems to me that the case where the SRet parameter is passed in X0 isn't explicitly handled here.
Should it be?
If not, I think the comment above should be extended to explain how that case is handled elsewhere.

lib/Target/AArch64/AArch64ISelLowering.cpp
3211–3212 ↗(On Diff #194951)

I get the impression that this doesn't need to be protected by an "IsWin64" test.
The sret inreg combination will only be produced by a front-end when targeting windows. So maybe best to just test for "sret" and "inreg" attributes to alter behaviour, not also on the "IsWin64" predicate?

Or maybe this block of code is there to make sure that on Windows the Sret pointer does get returned (in X0). And there is no other way to get the information that the sret pointer must be returned, apart from there being an "sret inreg" function parameter present?
If that's the case, I think that extending the comment "Struct returns on Windows" to e.g. "On windows, Sret pointers must be returned, so record the pointer in a virtual register at the start of the function so it can be returned in the epilogue", makes the intent of this code block a bit easier to understand. Having looked into X86ISelLowering for equivalent functionality, I saw the following words were used in a comment there, that I thought was especially clear: "Save the argument into a virtual register so that we can access it from the return points.".

3217–3225 ↗(On Diff #194951)

Assuming my understanding on the intent of this code is correct (i.e. "Save the argument into a virtual register so that we can access it from the return points."), there is no need to make a distinction between whether the sret pointer is passed in in X0 or X1 - it needs to be returned via X0 in both cases.
So, why not just have a very simple loop here that looks for the presence of an argument with both "sret" and "inreg" attributes and save the value of that argument in the getSRetReturnReg?
I think the loop could look almost identical to how similar functionality is implemented in X86ISelLowering with quite a bit less code - making this quite a bit easier to understand and maintain?

4001–4004 ↗(On Diff #194951)

Similar comment to the above: maybe it's not necessary to check for IsWin64 here - just checking on whether getSRetReturnReg is set could be enough?
It would make the code less complex.
The comment could be extended a bit to make this a bit more clear - I thought the equivalent comment on X86IselLowering was pretty clear. Maybe something like:

// Windows AArch64 ABIs require that for returning structs by value we copy
// the sret argument into X0 for the return.
// We saved the argument into a virtual register in the entry block,
// so now we copy the value out and into X0.
lib/Target/AArch64/AArch64MachineFunctionInfo.h
94–98 ↗(On Diff #194616)

I see - thanks for explaining!

mgrang updated this revision to Diff 195896.Apr 19 2019, 12:20 PM

Thanks @kristof.beyls for the review. We are still working on the combining the clang patches which should fix PR41135 and PR41136. I will then add unit tests to this patch.

mgrang marked 7 inline comments as done.Apr 19 2019, 12:22 PM
rnk added a comment.Apr 19 2019, 1:34 PM

I expect to be offline for the next 7 days, so if anyone else would want to review/approve an updated version of this patch: I'd also be happy with that if the above comments have been taken into account.

Fair enough, I just got back to work, so I'll look into it. :)

From what I can tell, the last update implements all the suggested simplifications. I have a suggestion, but maybe it cuts against the direction you were suggesting.

Overall, I'm happy with this.

lib/Target/AArch64/AArch64CallingConvention.td
39–51 ↗(On Diff #195896)

I feel like the whole discussion of free function vs. C++ instance method (X0/X1) could be avoided by falling through to the normal i64 argument handling below. IIUC, these are roughly the conditions we want:

if (IsSRet && (IsInReg || !IsWin64))
  // assign to X8

I'm not sure how to express that in our calling conv tablegen, though.

lib/Target/AArch64/AArch64MachineFunctionInfo.h
94–98 ↗(On Diff #194616)

If we did want to avoid this duplication, I would say just toss it directly on MachineFunction. But, for now, I think it's best to do things like we normally do: duplicate some logic between targets but try to keep them aligned.

mgrang updated this revision to Diff 196126.Apr 22 2019, 1:20 PM

Do you need to modify AArch64TargetLowering::isEligibleForTailCallOptimization to prevent a tail call in cases where the tail call would return the wrong value?

mgrang updated this revision to Diff 196142.Apr 22 2019, 3:17 PM

Do you need to modify AArch64TargetLowering::isEligibleForTailCallOptimization to prevent a tail call in cases where the tail call would return the wrong value?

Thanks Eli. I have prevented tail call for arm64-windows when sret arguments are present.

efriedma added inline comments.Apr 22 2019, 3:31 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
3435 ↗(On Diff #196142)

Can we restrict this to only arguments with both sret and inreg?

Maybe leave a FIXME to check whether the callee also has an sret+inreg argument. That optimization isn't critical, but nice to have at some point.

mgrang marked an inline comment as done.Apr 22 2019, 3:44 PM
mgrang added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
3435 ↗(On Diff #196142)

You cannot have inreg without sret. So I think just checking inreg should be enough. Will change this to check for inreg. Thanks.

mgrang updated this revision to Diff 196152.Apr 22 2019, 4:03 PM
mgrang marked an inline comment as done.

Is it possible to enable fast-isel and/or global-isel for ARM64 Windows? Do we need additional code for them? (I'm okay with making those changes separate patches, but it should be easy to at least add testcases now.)

lib/Target/AArch64/AArch64CallingConvention.td
57 ↗(On Diff #196152)

Instead of the separate CCIfValNo check, can you just write CCAssignToRegWithShadow<[X0, X1], [W0, W1]>, like we do for normal i64 arguments?

lib/Target/AArch64/AArch64ISelLowering.cpp
3435 ↗(On Diff #196142)

Should we also check for inreg in other places the patch checks for sret?

3218 ↗(On Diff #196152)

We should call LowerFormalArguments exactly once... maybe make this an assertion?

Is it possible to enable fast-isel and/or global-isel for ARM64 Windows? Do we need additional code for them? (I'm okay with making those changes separate patches, but it should be easy to at least add testcases now.)

GlobalISEL is the default for unoptimized builds, and iirc FastISEL should work as well (it was pretty well working already when GlobalISEL was enabled by default for aarch64).

rnk added inline comments.Apr 23 2019, 1:07 PM
lib/IR/Function.cpp
149 ↗(On Diff #196152)

Why have this check? inreg is typically applied to regular integer i32 arguments and vectors.

mgrang updated this revision to Diff 196317.Apr 23 2019, 1:45 PM
mgrang marked 3 inline comments as done.
rnk added inline comments.Apr 23 2019, 2:16 PM
lib/Target/AArch64/AArch64CallingConvention.td
39–51 ↗(On Diff #195896)

Nice, I think this is simpler.

lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

Hang on, why are we checking for inreg at all? Won't we still erroneously perform TCO on this example that we currently TCO?

struct Foo {
  Foo();
  Foo(const Foo &o);
  Foo(int, int);
  ~Foo();
  void *p;
};

void bar(Foo *, int, int);

Foo getfoo(int x, int y) {
  Foo f;
  bar(&f, x, y);
  return f;
}

In this example, NRVO fires, and &f is the sret pointer passed in. bar isn't guaranteed to return &f, but we tail call it, and inreg isn't involved at all.

I think the precise check that we are looking for is that, if the caller has an arg marked sret, then that argument must be passed to the callee and be marked sret. It doesn't matter if the callee receives the sret pointer in a different argument (X0, X1, X8), what matters is that it's returned in X0, and that matches the caller function's requirements. It might also be worth enabling TCO on constructors, which use the returned convention in IR, so something like this:

Foo getfoo(int x, int y) {
  return Foo(x, y);
}

Bailing on the caller using sret is conservatively correct of course.

mgrang marked an inline comment as done.Apr 23 2019, 2:22 PM
mgrang added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

Thanks Reid. I see in other targets like RISCV we disable TCO if caller or callee use StructRet. Should we do the same here? Also should we restrict that to only Win64?

rnk added inline comments.Apr 23 2019, 2:30 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

I suppose yes, restricting it to Win64 makes sense, since that's where the preservation requirement comes from.

mgrang updated this revision to Diff 196334.Apr 23 2019, 2:54 PM
mgrang updated this revision to Diff 196338.Apr 23 2019, 2:57 PM
rnk accepted this revision.Apr 23 2019, 4:30 PM

I went over other reviewers comments, and I think this was the only open one:

Is it possible to enable fast-isel and/or global-isel for ARM64 Windows? Do we need additional code for them? (I'm okay with making those changes separate patches, but it should be easy to at least add testcases now.)

Based on the comments in the test, I take it this will be addressed in a subsequent patch. Sounds good.

lgtm

This revision is now accepted and ready to land.Apr 23 2019, 4:30 PM
efriedma added inline comments.Apr 23 2019, 7:11 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

Hang on, why are we checking for inreg at all?

"inreg" specifically marks the calls where we have to restore x0. For normal C calls, we follow the usual AArch64 ABI rules, and don't need to preserve the sret address.

mgrang marked an inline comment as done.Apr 24 2019, 2:01 PM
mgrang added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

I agree with Eli. We need to avoid TCO when we have to restore X0. The "sret" attribute is also used for aggregates larger than 16 bytes where the address is passed in X8 but no register needs to be restored.

Like here:

struct S3 { int a, b, c, d, e; };
S3 f3() { return S3{}; }

define dso_local void @"?f3@@YA?AUS3@@XZ"(%struct.S3* noalias nocapture sret %agg.result)

The "inreg" attribute specifies that X0 needs to be restored. So I guess we should indeed check for "inreg" (not "sret").

rnk added inline comments.Apr 24 2019, 3:25 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
3430 ↗(On Diff #196317)

Sorry, my misunderstanding. I thought the sret pointer was returned in all cases: aggregate, non-aggregate, instance method or otherwise, but I see it's only for non-aggregates.

mgrang updated this revision to Diff 196544.Apr 24 2019, 4:20 PM

@rnk @efriedma Is this patch good to commit?

efriedma added inline comments.Apr 29 2019, 12:09 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
3420 ↗(On Diff #196544)

Unnecessary change.

test/CodeGen/AArch64/arm64-windows-returns.ll
72 ↗(On Diff #196544)

Please add a testcase for a non-instance method, where the first argument is inreg.

mgrang updated this revision to Diff 197162.Apr 29 2019, 12:36 PM
mgrang marked an inline comment as done.
efriedma accepted this revision.Apr 29 2019, 12:48 PM

LGTM. (Please merge this at the same time as the clang patch.)