Page MenuHomePhabricator

CXX_FAST_TLS calling convention: performance improvement for PPC64

Authored by tjablin on Feb 22 2016, 8:06 PM.



CXX_FAST_TLS calling convention: performance improvement for PPC64. This is the same change on PPC64 as r255821 on AArch64. I have even borrowed his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target

supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.

2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to

virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.

3> we also need to make sure the explicit copies will not be eliminated.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 48772.Feb 22 2016, 8:06 PM
tjablin retitled this revision from to CXX_FAST_TLS calling convention: performance improvement for PPC64.
tjablin updated this object.
tjablin added reviewers: cycheng, kbarton.
tjablin updated this revision to Diff 48816.Feb 23 2016, 7:19 AM

Improve test case. Fix formatting errors. Include changes to Omit changes to PPCTargetMachine.cpp that were part of another patch.

tjablin added a subscriber: llvm-commits.
kbarton edited edge metadata.Mar 30 2016, 10:34 AM

Please add more info in the summary about what this is, and why it is necessary.
Also, if you could include a link to the Phabricator review for the AArch64 review that would be very helpful for people who are reviewing this through a browser.


Please add a comment here about why this is necessary to do before the early return.
This will (hopefully) ensure this doesn't get moved below the return and introduce a problem again.

kbarton added inline comments.Mar 30 2016, 1:24 PM

Is this guaranteed to be a null-terminated list?
I looked quickly but could not find the answer.


It could be my browser, but the indentation of the return looks way off.


I thought this was disabled for 32-bit on line 11613 of PPCISelLowering.cpp.
Can you combine the check for 32-bit with the check for DarwinABI on line 138? That should simplify this logic a bit.


I find this nesting of conditional operators very difficult to read.

kbarton requested changes to this revision.Apr 1 2016, 1:49 PM
kbarton edited edge metadata.

Please see my previous comments.

This revision now requires changes to proceed.Apr 1 2016, 1:49 PM
tjablin updated this revision to Diff 52644.Apr 4 2016, 5:37 PM
tjablin edited edge metadata.

Simplify logic in getCalleeSavedRegsViaCopy. Add comments.

kbarton accepted this revision.Apr 6 2016, 8:53 AM
kbarton edited edge metadata.

When committing, please add more details in the commit message about what exactly this is doing. The current summary requires you to look up another revision to understand what this is doing. Thanks.

This revision is now accepted and ready to land.Apr 6 2016, 8:53 AM
tjablin updated this object.Apr 7 2016, 6:19 PM
tjablin edited edge metadata.
cycheng closed this revision.Apr 8 2016, 5:11 AM
cycheng edited edge metadata.

Committed r265781
(On behalf of Tom)