This patch is the callee side for https://reviews.llvm.org/D73209. It removes the fatal error when we pass more formal arguments than available registers.
Details
Diff Detail
Event Timeline
There's some trailing whitespace that emits warnings when I apply the patch. Probably a good idea to tidy it up. I'll continue to review the content of that patch.
$ git apply D74225.diff.txt
D74225.diff.txt:157: trailing whitespace.
; 32BIT: - { id: 3, type: default, offset: 80, size: 4
D74225.diff.txt:161: trailing whitespace.
; 32BIT: - { id: 7, type: default, offset: 64, size: 4
D74225.diff.txt:179: trailing whitespace.
; 64BIT: - { id: 0, type: default, offset: 168, size: 8
D74225.diff.txt:180: trailing whitespace.
; 64BIT: - { id: 1, type: default, offset: 160, size: 8
D74225.diff.txt:181: trailing whitespace.
; 64BIT: - { id: 2, type: default, offset: 152, size: 8
warning: squelched 10 whitespace errors
warning: 15 lines add whitespace errors.
Did a passthrough of the test changes with a number of comments. I expect you wanted to make some test updates, nonetheless I thought I'd document what I'd like to see in the test coverage. Will review code changes next.
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
---|---|---|
1302 | suggest using 32BIT-DAG for the liveins. | |
1311 | I suggest 32BIT-DAG and reorder the memory locations in growing offset. IMO it makes it easier to read and understand the test. | |
1327 | DAG. | |
1336 | DAG and reorder. | |
1346 | I would like to see the expected output for the assembly added. We should validate that the instructions emitted locate the arguments in the correct memory location. We should also strive for consistency with the other tests, which have expected assembly. | |
1374 | suggest DAG. | |
1387 | remove blank line or add blank line in the earlier test for the sake of consistency. | |
1400 | I would also like to see the expected assembly here as well. | |
1401 | Additional test requests: ex: foo(double, double, double, double, char, short, int) |
This concludes my initial review. Mostly tidy up comments. The logic seems correct so semantically I think it's ok (for the incremental support that it adds.) I'll see if I can break it on the next round.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6957 | I think it would be useful to add a !byVal assertion on the args just to make it clear the logic you've implemented is not intended to handle this case. | |
6959 | whitespace. formatting. | |
6960 | I think it's better if this def is placed closer to its use. | |
6975–6976 | Since there doesn't appear to be any necessary handling to sign or zero extend on this code path, should there be an assertion? | |
6976 | I'd rather see this pushed into the respective conditional blocks. It's not used outside so it's cleaner to contain it locally. | |
6979 | I think this should be an assertion. CC_AIX should never push a loc which is neither reg nor mem. You could switch this up a bit: if (VA.isRegLoc()) { ...} | |
7023–7024 | formatting. | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1271 | The convention we've used for cc tests so far is to do expected output for both callee and caller. I'd like to see the caller IR and expected output as well. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6979 | I can change that to an assert. I was only following the precedent in LowerCall_AIX which uses 'report_fatal_error'. I can check the code to: assert((VA.isRegLoc() || VA.isMemLoc()) if (VA.isRegloc()){ register code ... }else { assert(VA.isMemLoc()) memory code ... } Does that work? |
Added new testcases which include both callee and callee IR and assembly expected output.
Addressed comments:
- Added ByVal assertion
- Replaced no RegLoc/MemLoc fatal error with an assertion
- Moved variable definitions closer to where they are used
- Fixed formatting
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6957 | Good idea, added. | |
6975–6976 | I added assert(ObjSize <= ArgSize); is this what you had in mind? Or is a different assert specifically related to sign or zero extension needed? | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1271 | Added both caller and callee for all tests. | |
1311 | Agreed, it does look better once done that way. |
Code semantics look ok. If you can post a revision to address the minor concerns I need to have one more pass-through of the test updates in detail.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7001 | suggest continue here, then get rid of the else { } | |
7005 | you already latched VA.getLocVT and VA.getValVT into locals you can reuse. | |
7007 | I find the use of name "ObjSize" and "ArgSize" confusing. Suggest sticking to terminology "ValueSize" and "LocSize". | |
7010 | suggest: // AIX objects are right justified because they are word written to stack memory from their register image. | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1311 | suggest 32BIT-LABEL to tag the expected sections. | |
1350 | Suggest adding a LABEL check for the assembly output. Applies multiple. | |
1351 | suggest reordering these lwz closer to the reg use for readability. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7007 | Thanks, I agree it's better calling them something else. |
All minor comments except for the concern on whether or not a truncation node is needed when ValSize < LocSize.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6846 | I agree with this change though I plan that it will be committed on another change to correct the caller stackarg patch I authored previously. I think it's best to do this as an independent change with new test coverage for what was missed. | |
6850 | this comment is now in the wrong place. It belongs with line 6844. | |
7010 | updated suggestion: // Objects are right-justified because AIX is big-endian. I tend to avoid the use of the term word because in Power ISA it means 4 bytes. To others it means PtrByteSize. | |
7029 | MinReservedArea is now a misnomer. This is effectively the LSA + PSA size, whatever the PSA size was evaluated to. Min is always 56 in 32-bit and 112 in 64-bit. | |
7029 | Do we need to add a truncate node when ValueSize is less than LocSize? Suggest "ValSize". | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1301 | 32BIT-LABEL ? | |
1327 | 64BIT-LABEL probably better. | |
1337 | 64BIT-LABEL | |
1565 | CHECK-LABEL | |
1580 | CHECK-LABEL | |
1589 | CHECK-LABEL | |
1817 | LABEL | |
1823 | LABEL | |
1832 | LABEL | |
1846 | LABEL |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6846 | I agree, it wasn't meant to be included here initially. I added it when trying to figure out some of the offsets I was seeing. I will keep it in but let the follow up patch which includes this land first before I do any commits of this one. | |
7029 | I think the name is used in all PPC targets because of the call to FuncInfo->setMinReservedArea, I agree with the comment though. I changed it to CallerReservedArea. I think if I do that, then we don't even need the comment. | |
7029 | From what I've looked into, no other PPC targets add a truncate node in their respective LowerFormalArguments() functions. I did find it being done on x86 for i1s, however, and poked around to see why the difference is there. As far as I understand, it looks like we don't need to do on PPC since PPCTargetLowering::LowerLOAD() will do that when we set the load node in LowerFormalArguments. The relevant code snippet from PPCTargetLowering::LowerLOAD() is below: SDValue NewLD = DAG.getExtLoad(ISD::EXTLOAD, dl, getPointerTy(DAG.getDataLayout()), Chain, BasePtr, MVT::i8, MMO); SDValue Result = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, NewLD); I'm fairly sure that's the reason why we don't see truncation to loads from memory in the various PPC LowerFormalArguments() since it's done in LowerLoad for certain types. | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1301 | Sorry, missed those in the initial update. |
Added Check-Label to testcases.
Changed variable names.
Fixed testcases to improve readability.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6844–6848 | Braces are no longer necessary. | |
6853 | trailing whitespace. | |
7025 | I don't feel that this comment adds to understanding the code. I suggest removing it. | |
7027 | I don't feel that this comment adds to understanding the code. I suggest removing it. | |
7035 | formatting. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7029 | To add further, the special load handling is needed for i1s since we cant do i1 sized loads on Power. We extend to an i8 and then truncate. Also, in 64Bit mode for example, the caller will store parameters sized 4 bytes or less with 8 byte loads and sign or zero extended to 8bytes. For the callee, we adjust the offset where the load happens to load only the 4bytes containing the argument and then do a 4byte load from there. There shouldn't be a need to truncate in that case. |
Fixed trailing whitespace issues when applying the patch.
Fixed formatting and removed redundant comments.
I think we need some changes to ensure frame objects are created with their proper size and not word size. i64 is passed as two i32s but we should coalesce these into a single 8-byte object. Like-wise arguments smaller than word size are generating frame objects that are word size.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7010 | I don't see the updated comment reflected in the current patch. I still see: // AIX objects are right justified because they are word written to stack // memory from their register image. and I would prefer this to say // Objects are right-justified because AIX is big-endian. | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1320 | When i looked at Linux PPC32 I saw that they used 8 byte frame objects for the long long parameters. I think we should be consistent with the precedent. | |
1352 | I got confused by why there's no load at 56 to account for the msw of the 9th parameter (i64) then realized the result is stored in i32, so the optimizer is probably getting clever. To keep the test straight forward and avoid confusion, I'd suggest accumulating the result in an i64 so that we'll see (and can validate) the load at 56. | |
1356 | My preference is to see the: | |
1382 | The ASM32PWR4-DAG expects the add operations but they're omitted in the ASM64PWR4-DAG expected output. I don't strongly prefer either though I strongly prefer consistency. | |
1409 | I usually name the toc load appending ADDR to indicate the register holds the address of the variable, then name the loads after to indicate they hold the value of the variable. Seems the naming you have here his backwards. | |
1438 | Logically I like to see the parameter initializations in order. I find it easier to understand in the approach. | |
1499 | would be a bit nicer to tidy the order of parameter reg inits. | |
1632 | A little bit nicer if these are moved next to their use in the DAG block. | |
1693 | why are all the other STW flagged "killed" and this one not? | |
1713 | I don't think that it's useful to expect the constant pool loads given that the STW matches on SCRATCHREG. There's no validation of where each constant is written and we're not intending to test the constant loads anyways. | |
1781 | REGDADDR | |
1784 | lfd [[REGD:[0-9]+]], 0([[REGDADDR]]) | |
1785 | diddo on REGF1 and REGF2 | |
1861 | Should be size 2. Though the arg passes in 4 bytes, the object is 2 bytes at offset 58. |
We have 2 choices for the smaller then save slot sized objects:
- Follow XL's behavior and perform a load of the entire save slot, and create frame objects of this size to match. If we chose this we would have to build and insert 'AssertSExt' and 'AssertZExt` nodes and truncates appropriately. The truncates should get cleaned up down stream due to the assert-extended nodes.
- Perform the offset adjustment and load using the load instruction with the correct extending type, and have frame objects created at the correct offset matching the size of the load.
It seems we are mixing these 2 right now? but I have to play with some smaller tests to make sure my understanding is correct.
Having the caller and callee semantics in the same test is helpful, but it leads to having really large tests. Can we split the tests into 2. 1 for everything fits in regs and 1 for when we have to go to the stack?
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6846 | I don't see the LocInfo being used on the Memloc path for lower formal args, so this shouldn't semantically affect this patch.Is it included because it affects the lit tests in an observable way? If it doesn't pull it out, if it does then you can pull this patch into your branch and use just the diff of your changes for this review. | |
7031 | Why don't we have to perform the adjustment for ppc32? IIUC ValSize is promoted to i32 for i16/i8, but what about i1s? They shouldn't be promoted by the generic selection dag code because they are a legal register type on PowerPC. At the very least we need a lit test that includes i1 zeroext passed in a stack slot. Sorry if there is one, I scanned through and don't see one but the test changes are 1000 lines long :) |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7010 | I missed this, sorry about that. | |
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll | ||
1320 | I spent a while investigating this and it looks like the PPC32 and 32BIT AIX do the same thing here. I see that both create two 4yte sized fixed frame objects for one i64 int: fixedStack: - { id: 0, type: default, offset: 60, size: 4, alignment: 4, stack-id: default, isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true, debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } - { id: 1, type: default, offset: 56, size: 4, alignment: 8, stack-id: default, isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true, debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } PPC32 fixedStack: - { id: 0, type: default, offset: 12, size: 4, alignment: 4, stack-id: default, isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true, debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } - { id: 1, type: default, offset: 8, size: 4, alignment: 8, stack-id: default, isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true, debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } However, if we don't run the IR under opt I see we add it adds allocas and stores for each of the parameter and this is what I think creates local stack objects for all the parameters, including a size 8 for the i64 parameter in both AIX and PPC32. On PPC32 and 32BIT AIX all of these local stack objects are at offset 0 and a negative local-offset, their size is : - { id: 8, name: ll9.addr, type: default, offset: 0, size: 8, alignment: 8, stack-id: default, callee-saved-register: '', callee-saved-restored: true, local-offset: -40, debug-info-variable: '', debug-info-expression: '', debug-info-location: '' } I'm not sure that what the use of these stack objects is since they are eliminated if I run the IR under opt which is what I've done to generate the IR for all test cases. I think they are related to lowering of the stack frame but I'm not entirely sure yet. | |
1352 | My initial thought in doing it this was to show that on AIX the upper word of the i64 holds the most significant values. But I agree, it's confusing and it's better to accumulate in an i64. | |
1356 | I think it's best to remove these as per your comment and my reply below. | |
1382 | This is inconsistent, your right. I'm leaning toward only showing the loads and stores in these test cases to more clear about what we are testing. It's also the fastest change to make. | |
1409 | Thanks, fixed now. | |
1438 | Yes, that's better, I agree. | |
1693 | I'm not sure...I found that if I change the value that's held in that register (the double 1.000000e-01) to anything else, the register is then flagged "killed" like the rest. I will need to follow up with some more investigation to find what determines when a register will not be killed in a case like this. | |
1713 | In this case here I'm using SCRATCHREG only to not hard code the registers in which we do the constant pool loads. Hypothetically, they can be any register and I don't want the test to break if any of those scratch registers change at any point in time. I could take them out altogether maybe? | |
1861 | To me it looks like CC_AIX promotes i16 to i32s, and the ValVT passed to getLoad() is i32, so I think this is why it does a 4 byte load at offset 56. // Promote integers if needed. if (ValVT.getSizeInBits() < RegVT.getSizeInBits()) LocInfo = ArgFlags.isSExt() ? CCValAssign::LocInfo::SExt : CCValAssign::LocInfo::ZExt; Is this not correct in 32BIT mode? I see PPC32 do the same thing and XLC on AIX also using a 4byte load (lwz) at offset 56 for an i16. |
Updated comment.
Fixed testcases:
- Rearranged IR and assembly of parameters in which they are initialized.
- Renamed filecheck variables to follow convention already established in aix-cc-abi.ll
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6846 | It's not needed for the lower formal args but this patch includes caller tests which are affected. | |
7031 | Sorry, @sfertile I missed your comments and you're right. The i1 zeroext is a case which we'll need to do exactly that. I will remove the IsPPC64 and a test for it. |
Change the offset for args smaller than loc size on 32BIT too and add a bool stack test case.
On the callee side, for i32s on 64BIT and i1s in 32/64BIT we create fixed objects the size of the value type and then use the appropriate load (lw or lb). Except for i8 and i16s which I think are extended to fit the reg size. I think this is consistent with LLVM PPC32.
On the caller side we always do a PtrByteSize store.
Having the caller and callee semantics in the same test is helpful, but it leads to having really large tests. Can we split the tests into 2. 1 for everything fits in regs and 1 for when we have to go to the stack?
Maybe we can do an NFC patch which splits the tests?
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7031 | I made it so we to the adjustment on 32BIT too and added a bool test case. |
Rebase off https://reviews.llvm.org/D75126 and fix expected output of caller side tests based on that patch.
I agree with this change though I plan that it will be committed on another change to correct the caller stackarg patch I authored previously. I think it's best to do this as an independent change with new test coverage for what was missed.