This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AArch64] Increase CSR cost when defering use of CSR is preferred
Needs ReviewPublic

Authored by junbuml on Jun 25 2017, 8:08 PM.

Details

Summary

The RegAllocGreedy currently favours allocating a CSR over splitting a region.
With this change we try to defer allocating CSR by increasing the CSR cost when
none of user blocks in the live range contains a call. Similar change was
initially suggested by Nemanja in D27366 for PowerPC and Wei posted D32201 to
increase the CSR cost.

Comparing with D27366, this change is more conservative in splitting over allocating
CSR becuase we increase the CSR cost only when none of blocks in the live range
have call. Instead of a fixed arbitrary CSR cost, we increase the CSR cost considering
the number of tradable splits/spills against a spill of a CSR in the entry block.

I'm posting this since it seems that Nemanja is no longer pursuing D27366.
Hopefully this change open up more productive discussion and make progress
in the context-sensitive CSR cost model.

With this patch, I observed +10% performance gain in spec2006/astar in AArch64.

Diff Detail

Event Timeline

junbuml created this revision.Jun 25 2017, 8:08 PM
gberry added a subscriber: gberry.Jun 26 2017, 8:08 AM
wmi added inline comments.Jun 28 2017, 4:50 PM
lib/CodeGen/RegAllocGreedy.cpp
2384–2390

I don't see why the logic here is better than D32201. It will miss the motivational case we are targeting. For a long live range of a function argument, which crosses call in a cold bb, we want to increase the first time CSRCost so that regalloc can choose to split the long live range before and after the call in the cold bb and use non-CSR register in the places other than the cold bb. As a result shrinkwrapping will be unblocked because after the splitting there will be less or no CSR use inside of entry/exit.

junbuml added inline comments.Jun 29 2017, 10:47 AM
lib/CodeGen/RegAllocGreedy.cpp
2384–2390

Hi Wei,

This change was intended to make a process for the context-sensitive CSR cost model proposed in D27366. The case this change is targeting is different from D32201. I believe we can do this change on top of D32201 or this change can be treated differently from D32201.

This change try to defer allocating CSR when we prefer not to use one in some context. For that I check if there is any user block which contain a call. If none of the blocks has call, then I said allocating a CSR is not preferred in that context. Then I increase CSRCost based on the frequency of the entry block.

Like D32201, this change encourage more shrink-wrapping. In my experiment with spec2006/astar, one of the hot function was shrink-wrapped as we discourage the first allocation of CSR for a LR which do not expand across any call.

junbuml updated this revision to Diff 104690.Jun 29 2017, 10:54 AM
junbuml edited the summary of this revision. (Show Details)

Changed the return value of getNumberOfTradableSpillsAgainstCSR from 1 to 0 in AArch64 as I didn't see any profitable case yet.

qcolombet requested changes to this revision.Jun 30 2017, 3:14 PM
qcolombet added inline comments.
include/llvm/Target/TargetLowering.h
3406

I don't think we should expose. Either regalloc can always do the right call or it cannot and we shouldn't ask the target to decide if it wants to be lucky here.

3412

I don't think we can have a simple numbers as in static number of instructions. The cost of splitting depends where it happens and not taking that into account would for sure lead to cases where we get it wrong.

3419

Ditto.

lib/CodeGen/RegAllocGreedy.cpp
2398

That formula does not make sense to me.
We should compare the cost of spilling vs. the CSRCost.
EntryFreq * #tradableSpills is not the cost of spilling.

lib/Target/AArch64/AArch64ISelLowering.cpp
10787

What do user blocks mean in this context?

This revision now requires changes to proceed.Jun 30 2017, 3:14 PM
junbuml updated this revision to Diff 105535.Jul 6 2017, 2:35 PM
junbuml edited edge metadata.
junbuml marked an inline comment as done.
junbuml added inline comments.
include/llvm/Target/TargetLowering.h
3412

I rename this to getWeightFactorToTheFirstCSRAllocation().
IIUC, the cost of splitting calculated in calculateRegionSplitCost() consider where it happen. We have to compare this split cost with the CSRCost which should be the cost for a spill of the CSR in the entry block. The return value of getWeightFactorToTheFirstCSRAllocation() approximates the number of copies which can be traded against the first spill of CSR in the entry block. In AArch64, I set 30, which means the very push/pop for the first CSR in the entry/exit block is tradable against to 30 copies in other blocks. Hopefully, this might estimate the benefit of shrink wrapping opportunity when deferring the first CSR allocation.

lib/CodeGen/RegAllocGreedy.cpp
2398

IIUC, the spilt cost is the sum of frequencies of the copies at all the related blocks, which is calculated in the current cost model. Similarly, the CSR cost should be a cost for a spill of the CSR in the entry block. I think that why we decide the CSRCost based on the frequency of the entry block in initializeCSRCost().
Since we have to compare the cost of copies with the cost of push/pop in prologue|epilogue, especially when allocating the first CSR is not preferred, I gave a weight factor through getWeightFactorToTheFirstCSRAllocation(). Hopefully this make sense to you. If not, please let me know any suggestion.

lib/Target/AArch64/AArch64ISelLowering.cpp
10787

Rename the function name and update the comment.

junbuml updated this revision to Diff 109629.Aug 3 2017, 2:11 PM
junbuml retitled this revision from [AArch64] Increase CSR cost when defering use of CSR is preferred to [WIP][AArch64] Increase CSR cost when defering use of CSR is preferred.

Kindling ping. Please let me know any comment or suggestion regarding this change.

I believe the current CSRCost fixed in initializeCSRCost() is very conservative. In most case, we allocate a CSR rather than split ranges. However, in some situation (e.g, when none of user blocks of a LR has a function call), we should have more possibility to split the range because if we allocate a CSR too early, the CSR would not be accessed in some paths even after spilling it in the entry block. When we properly defer the use of CSR we might expose more shrink wrapping opportunities.

In this change, I treated the cost of spilt as the frequencies of the copies at the related spots while the cost of using the CSR is equivalent to the first spill in the entry block. Based on that, if none of user blocks of the LR has a function call, I increased the CSR cost by multiplying the frequency of the entry block with a fixed weight factor for the spill of the first CSR as we compare the cost between copies and the spill of CSRs.

wuzish added a subscriber: wuzish.EditedAug 4 2019, 10:09 PM

Should this be rebased and continued to move on? Or it already could be addressed by https://reviews.llvm.org/D41765? @junbuml @qcolombet