Adds support for passing ByVal formal arguments as long as they fit in a single register.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
---|---|---|
41–42 | If no one has any objections I'll land the check prefix changes in this test and the 64-bit equivalent test as a separate NFC patch. |
I think that we may need some more struct types with varying size elements. Right now, the tests mostly will check for mostly lbz at an offset. Would be great if we see that lha or even lw is used when required. However, this may be easier to do in a patch where arguments greater than register size are used?
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7068 | There is already a fatal error in CC_AIX for this, would an assert be better here? | |
7077 | This variable in the other parts of the function is called FIN, we should be consistent I think. |
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
---|---|---|
99 | Do we want to add in an 0-byte by-val test? |
I'm going to implement the suggested changes to the testing (ie a zero-sized-byval and different byval types so we are loading more then just a char). I'll do so in a separate patch which makes the changes and shows the difference in the existing tests and then I'll rebase this patch onto that.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7068 | That fatal error is going to go away in the next patch implementing multiple registers on the caller side so I have a similar error here so we don't have to change this function in that patch. I can remove it if you want though. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6985 | LLVM has alignTo, which I believe you can use. | |
7064 | I don't object to the name TrueSize but I prefer the name ByValSize. | |
7065 | I understand that you're likely preparing for the next patch, but I think it's simple enough to: if (ByValSize > PtrByteSize) report_fatal_error then skip the rounding and write the single register case. You know the StackSize is always PtrByteSize. | |
7092 | Does this store get elided by optimization when possible? For example: I assume we don't want to manifest the store to the stack. It's odd to me because we don't manifest the store for the equivalent: int foo(int i) { return i; } |
- Fixed assertion with zero sized byvals that new testing exposed.
- Addressed various review comments (renamed TrueSize, use alignTo, etc).
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7065 | Rounding the byval size up is the same whether it is smaller then a single reg, or arbitrarily large. Using PtrByte size and skipping the rounding seems rather artificial, I'd rather keep this as is. | |
7092 | No. Several of the lit test cases show where we DAG combine to extract the value from the register (see the 4-byte 32-bit test or the 8-byte 64-bit test cases) but we don't remove the dead store. We will need work that can generalize the optimization the DAG combines is doing for extracting the values from the register, and a pass to clean up the stores.
These are seemingly equivalent in C/C++ source, but consider the IR we have to generate for the 2 cases and I think the difference become apparent: ; Function Attrs: norecurse nounwind readonly define i32 @foo(%struct.S* nocapture readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 { entry: %i = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0 %0 = load i32, i32* %i, align 4, !tbaa !3 ret i32 %0 } ; Function Attrs: norecurse nounwind readnone define i32 @Bar(i32 returned %i) local_unnamed_addr #1 { entry: ret i32 %i } The GEP and the load force us to store to the stack for the ByVal arg making the 2 functions that happen to be 'equivalent' in source to be non-equivalent in their IR representations. |
Summary:
-Test tidy ups
-FYI that I noted a 32-bit 8 byte aligned by-val (struct S { double d; }) that will need special handling in future
-Requested some comments in the code / tests to explain the suboptimal codegen, which otherwise looks odd.
The patch looks very close to me.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6867 | Minor concern here that ValVT is a lie. I assume the consumer will ignore it when it sees the ByValSize is 0, but I'd rather it hold an invalid value than something that could be conceived as meaningful. | |
7092 | Thanks for the explanation. I think a brief comment in the source to describe why we always need the frame object would be useful. I also made comments in the test changes where I think test comments should explain the suboptimal expected codegen. I am in favour of the patch as you've presented it in order to get AIX functional and follow up with optimization improvement later. There is a case in 32-bit AIX: struct S { double d; } has 8 byte alignment (only for first member is a double) and the normal frame object location with PtrByteSize (4) alignment will not suffice. Currently that case emits an error so it doesn't need to be handled in this patch, but I mention it to you because I expect your code will need to be modified when we remove the error. You may need special handling in this case to remap the frame object location. | |
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
86 | missing ')'. Still passes but worth tidying. | |
100 | Also missing ')' | |
105 | Curious to me that the optimizer didn't tidy up the stw/lbz. It's ok for now but we need to ensure the param stack writes are being elided for performance. | |
172 | ')' | |
175 | missing trailing text? | |
180 | ')' | |
257 | trailing text? | |
262 | ) | |
265 | trailing text | |
362 | dead stores at opt? If this is temporary it probably warrants a comment in the expected output. | |
383 | ) | |
384 | ) |
IMO, the patch is pretty much done, just some test clean up.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7071 | Is it possible to add a continue after line 7070 and then remove the else? | |
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
78 | Maybe you can remove this line. I think the relevant info for the test is only the lines which show the offset and the size |
- split the ByVal handling into memory and register parts to remove an 'else'.
- add back some missing info for fixed stack objects in the lit test.
- remove check lines related to fixed stack objects which are not semantically meaningful.
Still working on adding the suggested comments.
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
---|---|---|
86 | It's missing from %ir.arrayidx1, align 8) but I didn't think it was important to include in the test output so I truncated it to show only what I though was relevant: that the load is derefrencable. If you want I can add them all back though. | |
99 | Yes, and adding it caught an assertion in the caller of LowerFormalArguemnts which I otherwise missed so thank you. |
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll | ||
---|---|---|
105 | For the 64-bit sub-targets we typically pass what would be ByVal on AIX as arrays of i64/i32 and coerce the values out which means no gep/load in the IR. On PPC32, we only pass the ByVals on the stack so no need to store the registers in the callee. The back end doesn't have anything to clean up the dead stores because we never produce them to begin with. | |
362 | I think we have a 2 stage cleanup in relation to performance:
The problem is to big and general to specifically call out here. Every test in this file will need to be changed when we implement that. |
Added a comment to describe why we must store the registers used to pas a ByVal to the stack.
clang-format: please reformat the code