This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][AIX] ByVal formal argument support: single register.
ClosedPublic

Authored by sfertile on Mar 18 2020, 6:22 PM.

Details

Summary

Adds support for passing ByVal formal arguments as long as they fit in a single register.

Diff Detail

Event Timeline

sfertile created this revision.Mar 18 2020, 6:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2020, 6:22 PM
sfertile added inline comments.Mar 18 2020, 6:24 PM
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
41–42

If no one has any objections I'll land the check prefix changes in this test and the 64-bit equivalent test as a separate NFC patch.

sfertile updated this revision to Diff 251388.Mar 19 2020, 8:13 AM

Landed check-prefix changes as an NFC patch and rebased ontop of that.

I think that we may need some more struct types with varying size elements. Right now, the tests mostly will check for mostly lbz at an offset. Would be great if we see that lha or even lw is used when required. However, this may be easier to do in a patch where arguments greater than register size are used?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7068

There is already a fatal error in CC_AIX for this, would an assert be better here?

7077

This variable in the other parts of the function is called FIN, we should be consistent I think.

jasonliu added inline comments.Mar 19 2020, 1:09 PM
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
99

Do we want to add in an 0-byte by-val test?

Couple places that a "const" could be added: Count, LASize, FI, VReg.

sfertile updated this revision to Diff 251620.Mar 20 2020, 6:46 AM
  • rename local to match other uses in function.
  • constify a handful of locals.
sfertile marked 3 inline comments as done.Mar 20 2020, 6:51 AM

I'm going to implement the suggested changes to the testing (ie a zero-sized-byval and different byval types so we are loading more then just a char). I'll do so in a separate patch which makes the changes and shows the difference in the existing tests and then I'll rebase this patch onto that.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7068

That fatal error is going to go away in the next patch implementing multiple registers on the caller side so I have a similar error here so we don't have to change this function in that patch. I can remove it if you want though.

cebowleratibm added inline comments.Mar 20 2020, 6:56 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6985

LLVM has alignTo, which I believe you can use.

7064

I don't object to the name TrueSize but I prefer the name ByValSize.

7065

I understand that you're likely preparing for the next patch, but I think it's simple enough to:

if (ByValSize > PtrByteSize) report_fatal_error

then skip the rounding and write the single register case. You know the StackSize is always PtrByteSize.

7092

Does this store get elided by optimization when possible?

For example:
struct S {int i;};
int foo(S s) { return s.i; }

I assume we don't want to manifest the store to the stack. It's odd to me because we don't manifest the store for the equivalent:

int foo(int i) { return i; }

sfertile updated this revision to Diff 252056.Mar 23 2020, 8:35 AM
sfertile marked 3 inline comments as done.
  • Fixed assertion with zero sized byvals that new testing exposed.
  • Addressed various review comments (renamed TrueSize, use alignTo, etc).
sfertile marked an inline comment as done.Mar 23 2020, 9:01 AM
sfertile added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7065

Rounding the byval size up is the same whether it is smaller then a single reg, or arbitrarily large. Using PtrByte size and skipping the rounding seems rather artificial, I'd rather keep this as is.

7092

No. Several of the lit test cases show where we DAG combine to extract the value from the register (see the 4-byte 32-bit test or the 8-byte 64-bit test cases) but we don't remove the dead store. We will need work that can generalize the optimization the DAG combines is doing for extracting the values from the register, and a pass to clean up the stores.

It's odd to me because we don't manifest the store for the equivalent:
int foo(int i) { return i; }

These are seemingly equivalent in C/C++ source, but consider the IR we have to generate for the 2 cases and I think the difference become apparent:

; Function Attrs: norecurse nounwind readonly
define i32 @foo(%struct.S* nocapture readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 {
entry:
  %i = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
  %0 = load i32, i32* %i, align 4, !tbaa !3
  ret i32 %0
}

; Function Attrs: norecurse nounwind readnone
define i32 @Bar(i32 returned %i) local_unnamed_addr #1 {
entry:
  ret i32 %i
}

The GEP and the load force us to store to the stack for the ByVal arg making the 2 functions that happen to be 'equivalent' in source to be non-equivalent in their IR representations.

sfertile updated this revision to Diff 252089.Mar 23 2020, 10:00 AM

Run clang-format.

Summary:
-Test tidy ups
-FYI that I noted a 32-bit 8 byte aligned by-val (struct S { double d; }) that will need special handling in future
-Requested some comments in the code / tests to explain the suboptimal codegen, which otherwise looks odd.

The patch looks very close to me.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6867

Minor concern here that ValVT is a lie. I assume the consumer will ignore it when it sees the ByValSize is 0, but I'd rather it hold an invalid value than something that could be conceived as meaningful.

7092

Thanks for the explanation. I think a brief comment in the source to describe why we always need the frame object would be useful. I also made comments in the test changes where I think test comments should explain the suboptimal expected codegen. I am in favour of the patch as you've presented it in order to get AIX functional and follow up with optimization improvement later.

There is a case in 32-bit AIX:

struct S { double d; }

has 8 byte alignment (only for first member is a double) and the normal frame object location with PtrByteSize (4) alignment will not suffice. Currently that case emits an error so it doesn't need to be handled in this patch, but I mention it to you because I expect your code will need to be modified when we remove the error. You may need special handling in this case to remap the frame object location.

llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
86

missing ')'. Still passes but worth tidying.

100

Also missing ')'

105

Curious to me that the optimizer didn't tidy up the stw/lbz. It's ok for now but we need to ensure the param stack writes are being elided for performance.

172

')'

175

missing trailing text?

180

')'

257

trailing text?

262

)

265

trailing text

362

dead stores at opt? If this is temporary it probably warrants a comment in the expected output.

383

)

384

)

IMO, the patch is pretty much done, just some test clean up.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7071

Is it possible to add a continue after line 7070 and then remove the else?

llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
78

Maybe you can remove this line. I think the relevant info for the test is only the lines which show the offset and the size

sfertile updated this revision to Diff 252376.EditedMar 24 2020, 11:35 AM
sfertile marked 4 inline comments as done.
  • split the ByVal handling into memory and register parts to remove an 'else'.
  • add back some missing info for fixed stack objects in the lit test.
  • remove check lines related to fixed stack objects which are not semantically meaningful.

Still working on adding the suggested comments.

sfertile added inline comments.Mar 24 2020, 11:36 AM
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
86

It's missing from %ir.arrayidx1, align 8) but I didn't think it was important to include in the test output so I truncated it to show only what I though was relevant: that the load is derefrencable. If you want I can add them all back though.

99

Yes, and adding it caught an assertion in the caller of LowerFormalArguemnts which I otherwise missed so thank you.

sfertile marked an inline comment as done.Mar 24 2020, 12:12 PM
sfertile added inline comments.
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
105

For the 64-bit sub-targets we typically pass what would be ByVal on AIX as arrays of i64/i32 and coerce the values out which means no gep/load in the IR. On PPC32, we only pass the ByVals on the stack so no need to store the registers in the callee. The back end doesn't have anything to clean up the dead stores because we never produce them to begin with.

362

I think we have a 2 stage cleanup in relation to performance:

  1. Extend the optimization that recognized how to pull the field from the register to work more generally
  2. Cleanup of the dead stores when we have extracted all the needed values from the register.

The problem is to big and general to specifically call out here. Every test in this file will need to be changed when we implement that.

sfertile updated this revision to Diff 252415.Mar 24 2020, 1:01 PM
sfertile marked 3 inline comments as done and an inline comment as not done.

Added a comment to describe why we must store the registers used to pas a ByVal to the stack.

cebowleratibm accepted this revision.Mar 24 2020, 4:06 PM
This revision is now accepted and ready to land.Mar 24 2020, 4:06 PM
This revision was automatically updated to reflect the committed changes.