Page MenuHomePhabricator

[AIX] Implement formal arguments passed in stack memory
ClosedPublic

Authored by ZarkoCA on Feb 7 2020, 8:06 AM.

Details

Summary

This patch is the callee side for https://reviews.llvm.org/D73209. It removes the fatal error when we pass more formal arguments than available registers.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ZarkoCA added inline comments.Feb 12 2020, 7:23 PM
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1428

Added assembly expected output.

1483

Thanks, that's a really good testcase. I added it as mix_callee below along with the respective caller testcase.

Code semantics look ok. If you can post a revision to address the minor concerns I need to have one more pass-through of the test updates in detail.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7027

suggest continue here, then get rid of the else { }

7031

you already latched VA.getLocVT and VA.getValVT into locals you can reuse.

7033

I find the use of name "ObjSize" and "ArgSize" confusing. Suggest sticking to terminology "ValueSize" and "LocSize".

7036

suggest:

// AIX objects are right justified because they are word written to stack memory from their register image.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1393

suggest 32BIT-LABEL to tag the expected sections.

1432

Suggest adding a LABEL check for the assembly output. Applies multiple.

1433

suggest reordering these lwz closer to the reg use for readability.

ZarkoCA updated this revision to Diff 245118.Feb 18 2020, 3:20 AM
ZarkoCA marked 4 inline comments as done.

Addressed comments and fixed testcases.

ZarkoCA marked 5 inline comments as done.Feb 18 2020, 3:22 AM
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7033

Thanks, I agree it's better calling them something else.

All minor comments except for the concern on whether or not a truncation node is needed when ValSize < LocSize.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6873

I agree with this change though I plan that it will be committed on another change to correct the caller stackarg patch I authored previously. I think it's best to do this as an independent change with new test coverage for what was missed.

6877

this comment is now in the wrong place. It belongs with line 6844.

7035

Do we need to add a truncate node when ValueSize is less than LocSize?

Suggest "ValSize".

7036

updated suggestion:

// Objects are right-justified because AIX is big-endian.

I tend to avoid the use of the term word because in Power ISA it means 4 bytes. To others it means PtrByteSize.

7052–7053

MinReservedArea is now a misnomer. This is effectively the LSA + PSA size, whatever the PSA size was evaluated to. Min is always 56 in 32-bit and 112 in 64-bit.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1383

32BIT-LABEL ?

1409

64BIT-LABEL probably better.

1419

64BIT-LABEL

1647

CHECK-LABEL

1662

CHECK-LABEL

1671

CHECK-LABEL

1899

LABEL

1905

LABEL

1914

LABEL

1928

LABEL

ZarkoCA planned changes to this revision.Feb 20 2020, 11:59 AM
ZarkoCA marked 15 inline comments as done.
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6873

I agree, it wasn't meant to be included here initially. I added it when trying to figure out some of the offsets I was seeing. I will keep it in but let the follow up patch which includes this land first before I do any commits of this one.

7035

From what I've looked into, no other PPC targets add a truncate node in their respective LowerFormalArguments() functions. I did find it being done on x86 for i1s, however, and poked around to see why the difference is there.

As far as I understand, it looks like we don't need to do on PPC since PPCTargetLowering::LowerLOAD() will do that when we set the load node in LowerFormalArguments.

The relevant code snippet from PPCTargetLowering::LowerLOAD() is below:

SDValue NewLD =
      DAG.getExtLoad(ISD::EXTLOAD, dl, getPointerTy(DAG.getDataLayout()), Chain,
                     BasePtr, MVT::i8, MMO);
  SDValue Result = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, NewLD);

I'm fairly sure that's the reason why we don't see truncation to loads from memory in the various PPC LowerFormalArguments() since it's done in LowerLoad for certain types.

7052–7053

I think the name is used in all PPC targets because of the call to FuncInfo->setMinReservedArea, I agree with the comment though.

I changed it to CallerReservedArea. I think if I do that, then we don't even need the comment.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1383

Sorry, missed those in the initial update.

ZarkoCA updated this revision to Diff 245711.Feb 20 2020, 12:02 PM
ZarkoCA marked 2 inline comments as done.

Added Check-Label to testcases.
Changed variable names.
Fixed testcases to improve readability.

cebowleratibm added inline comments.Feb 24 2020, 8:48 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6871–6876

Braces are no longer necessary.

6880

trailing whitespace.

7031

I don't feel that this comment adds to understanding the code. I suggest removing it.

7033

I don't feel that this comment adds to understanding the code. I suggest removing it.

7041

formatting.

ZarkoCA marked 16 inline comments as done.Feb 24 2020, 10:09 AM
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7035

To add further, the special load handling is needed for i1s since we cant do i1 sized loads on Power. We extend to an i8 and then truncate.

Also, in 64Bit mode for example, the caller will store parameters sized 4 bytes or less with 8 byte loads and sign or zero extended to 8bytes. For the callee, we adjust the offset where the load happens to load only the 4bytes containing the argument and then do a 4byte load from there. There shouldn't be a need to truncate in that case.

ZarkoCA updated this revision to Diff 246246.Feb 24 2020, 10:12 AM

Fixed trailing whitespace issues when applying the patch.
Fixed formatting and removed redundant comments.

I think we need some changes to ensure frame objects are created with their proper size and not word size. i64 is passed as two i32s but we should coalesce these into a single 8-byte object. Like-wise arguments smaller than word size are generating frame objects that are word size.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7036

I don't see the updated comment reflected in the current patch. I still see:

// AIX objects are right justified because they are word written to stack
// memory from their register image.

and I would prefer this to say

// Objects are right-justified because AIX is big-endian.
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1402

When i looked at Linux PPC32 I saw that they used 8 byte frame objects for the long long parameters. I think we should be consistent with the precedent.

1434

I got confused by why there's no load at 56 to account for the msw of the 9th parameter (i64) then realized the result is stored in i32, so the optimizer is probably getting clever. To keep the test straight forward and avoid confusion, I'd suggest accumulating the result in an i64 so that we'll see (and can validate) the load at 56.

1438

My preference is to see the:
add 3, 3, 5
add 3, 3, 6
add 3,3, 7
...
as a block and coalesce the stack loads next to where they're used.

1464

The ASM32PWR4-DAG expects the add operations but they're omitted in the ASM64PWR4-DAG expected output.

I don't strongly prefer either though I strongly prefer consistency.

1491

I usually name the toc load appending ADDR to indicate the register holds the address of the variable, then name the loads after to indicate they hold the value of the variable. Seems the naming you have here his backwards.

1520

Logically I like to see the parameter initializations in order. I find it easier to understand in the approach.

1581

would be a bit nicer to tidy the order of parameter reg inits.

1714

A little bit nicer if these are moved next to their use in the DAG block.

1775

why are all the other STW flagged "killed" and this one not?

1795

I don't think that it's useful to expect the constant pool loads given that the STW matches on SCRATCHREG. There's no validation of where each constant is written and we're not intending to test the constant loads anyways.

1863

REGDADDR

1866

lfd [[REGD:[0-9]+]], 0([[REGDADDR]])

1867

diddo on REGF1 and REGF2

1943

Should be size 2. Though the arg passes in 4 bytes, the object is 2 bytes at offset 58.

I think we need some changes to ensure frame objects are created with their proper size and not word size.

We have 2 choices for the smaller then save slot sized objects:

  1. Follow XL's behavior and perform a load of the entire save slot, and create frame objects of this size to match. If we chose this we would have to build and insert 'AssertSExt' and 'AssertZExt` nodes and truncates appropriately. The truncates should get cleaned up down stream due to the assert-extended nodes.
  2. Perform the offset adjustment and load using the load instruction with the correct extending type, and have frame objects created at the correct offset matching the size of the load.

It seems we are mixing these 2 right now? but I have to play with some smaller tests to make sure my understanding is correct.

Having the caller and callee semantics in the same test is helpful, but it leads to having really large tests. Can we split the tests into 2. 1 for everything fits in regs and 1 for when we have to go to the stack?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6873

I don't see the LocInfo being used on the Memloc path for lower formal args, so this shouldn't semantically affect this patch.Is it included because it affects the lit tests in an observable way? If it doesn't pull it out, if it does then you can pull this patch into your branch and use just the diff of your changes for this review.

7037

Why don't we have to perform the adjustment for ppc32? IIUC ValSize is promoted to i32 for i16/i8, but what about i1s? They shouldn't be promoted by the generic selection dag code because they are a legal register type on PowerPC. At the very least we need a lit test that includes i1 zeroext passed in a stack slot. Sorry if there is one, I scanned through and don't see one but the test changes are 1000 lines long :)

ZarkoCA marked 21 inline comments as done.Feb 27 2020, 5:33 PM
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7036

I missed this, sorry about that.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
1402

I spent a while investigating this and it looks like the PPC32 and 32BIT AIX do the same thing here.

I see that both create two 4yte sized fixed frame objects for one i64 int:
AIX:

fixedStack:
  - { id: 0, type: default, offset: 60, size: 4, alignment: 4, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
  - { id: 1, type: default, offset: 56, size: 4, alignment: 8, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

PPC32

fixedStack:
  - { id: 0, type: default, offset: 12, size: 4, alignment: 4, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
  - { id: 1, type: default, offset: 8, size: 4, alignment: 8, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

However, if we don't run the IR under opt I see we add it adds allocas and stores for each of the parameter and this is what I think creates local stack objects for all the parameters, including a size 8 for the i64 parameter in both AIX and PPC32.

On PPC32 and 32BIT AIX all of these local stack objects are at offset 0 and a negative local-offset, their size is :

- { id: 8, name: ll9.addr, type: default, offset: 0, size: 8, alignment: 8,
    stack-id: default, callee-saved-register: '', callee-saved-restored: true,
    local-offset: -40, debug-info-variable: '', debug-info-expression: '',
    debug-info-location: '' }

I'm not sure that what the use of these stack objects is since they are eliminated if I run the IR under opt which is what I've done to generate the IR for all test cases. I think they are related to lowering of the stack frame but I'm not entirely sure yet.

1434

My initial thought in doing it this was to show that on AIX the upper word of the i64 holds the most significant values. But I agree, it's confusing and it's better to accumulate in an i64.

1438

I think it's best to remove these as per your comment and my reply below.

1464

This is inconsistent, your right. I'm leaning toward only showing the loads and stores in these test cases to more clear about what we are testing. It's also the fastest change to make.

1491

Thanks, fixed now.

1520

Yes, that's better, I agree.

1775

I'm not sure...I found that if I change the value that's held in that register (the double 1.000000e-01) to anything else, the register is then flagged "killed" like the rest.

I will need to follow up with some more investigation to find what determines when a register will not be killed in a case like this.

1795

In this case here I'm using SCRATCHREG only to not hard code the registers in which we do the constant pool loads. Hypothetically, they can be any register and I don't want the test to break if any of those scratch registers change at any point in time. I could take them out altogether maybe?

1943

To me it looks like CC_AIX promotes i16 to i32s, and the ValVT passed to getLoad() is i32, so I think this is why it does a 4 byte load at offset 56.

// Promote integers if needed.
    if (ValVT.getSizeInBits() < RegVT.getSizeInBits())
      LocInfo = ArgFlags.isSExt() ? CCValAssign::LocInfo::SExt
                                  : CCValAssign::LocInfo::ZExt;

Is this not correct in 32BIT mode? I see PPC32 do the same thing and XLC on AIX also using a 4byte load (lwz) at offset 56 for an i16.

ZarkoCA updated this revision to Diff 247144.Feb 27 2020, 5:39 PM
ZarkoCA marked 6 inline comments as done.

Updated comment.
Fixed testcases:

  • Rearranged IR and assembly of parameters in which they are initialized.
  • Renamed filecheck variables to follow convention already established in aix-cc-abi.ll
ZarkoCA marked 2 inline comments as done.Feb 27 2020, 5:51 PM
ZarkoCA added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6873

It's not needed for the lower formal args but this patch includes caller tests which are affected.

7037

Sorry, @sfertile I missed your comments and you're right. The i1 zeroext is a case which we'll need to do exactly that. I will remove the IsPPC64 and a test for it.

ZarkoCA updated this revision to Diff 247263.Feb 28 2020, 7:27 AM

Change the offset for args smaller than loc size on 32BIT too and add a bool stack test case.

ZarkoCA marked 2 inline comments as done.EditedFeb 28 2020, 7:40 AM

I think we need some changes to ensure frame objects are created with their proper size and not word size.

We have 2 choices for the smaller then save slot sized objects:

  1. Follow XL's behavior and perform a load of the entire save slot, and create frame objects of this size to match. If we chose this we would have to build and insert 'AssertSExt' and 'AssertZExt` nodes and truncates appropriately. The truncates should get cleaned up down stream due to the assert-extended nodes.
  2. Perform the offset adjustment and load using the load instruction with the correct extending type, and have frame objects created at the correct offset matching the size of the load.

    It seems we are mixing these 2 right now? but I have to play with some smaller tests to make sure my understanding is correct.

On the callee side, for i32s on 64BIT and i1s in 32/64BIT we create fixed objects the size of the value type and then use the appropriate load (lw or lb). Except for i8 and i16s which I think are extended to fit the reg size. I think this is consistent with LLVM PPC32.

On the caller side we always do a PtrByteSize store.

Having the caller and callee semantics in the same test is helpful, but it leads to having really large tests. Can we split the tests into 2. 1 for everything fits in regs and 1 for when we have to go to the stack?

Maybe we can do an NFC patch which splits the tests?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7037

I made it so we to the adjustment on 32BIT too and added a bool test case.

ZarkoCA updated this revision to Diff 247270.Feb 28 2020, 8:05 AM

Fix whitespace errors, sorry for the noise.

ZarkoCA updated this revision to Diff 248527.Thu, Mar 5, 9:53 AM

Rebase off https://reviews.llvm.org/D75126 and fix expected output of caller side tests based on that patch.

sfertile accepted this revision.Mon, Mar 9, 7:03 AM

LGTM.

This revision is now accepted and ready to land.Mon, Mar 9, 7:03 AM
This revision was automatically updated to reflect the committed changes.