Page MenuHomePhabricator

[Regalloc][WIP] Increase CSR cost in RegAllocGreedy to favour splitting/spill over CSR first use
Needs ReviewPublic

Authored by wuzish on Thu, Aug 22, 1:13 AM.

Details

Reviewers
qcolombet
wmi
nemanjai
hfinkel
MatzeB
jsji
dmgreen
dexonsmith
Group Reviewers
Restricted Project
Summary

It's trying to solve the issue which was proposed at D32201 and D27366 before.

The register allocator currently favors allocating a CSR over splitting/spilling a region. This in effect results in introducing more opportunity to save/restore CSR registers at prologue/epilogue. It hurts performance especially when there is no need of prologue/epilogue for the hot path. If we can save/restore non-CSR across call, then we don't use CSR during the function and we don't need to save/restore them at prologue/epilogue.

It also influences the shrink-wrapping directly if we must use CSR during the function. For now copies of parameter registers into CSR's in the entry block when the parameter is live across any calls in the function. And of course, this disables shrink-wrapping because the save point then must be the entry block. After the patch, live ranges allocated to CSR registers will be shortened and will only be used in cold places, so that we may get better chances to do shrink-wrapping.

I refine the adjusting CSR cost to enhance the RegAllocGreedy to favour splitting/spill the virtual register which is being allocated instead of allocate CSR directly. If use CSR, we need add load/store pair in prologue/epilogue. So if the cost caused by spillings which is introduced by trying to avoid using CSR is larger than load/store pair in prologue/epilogue, we would choose to use CSR. I make the raw cost to be 1 because the cost is load/store pair in prologue/epilogue if use CSR. so the cost is 1 time spill like the cost calculated in SpillPlacer. Then we need scale the cost relative to entry frequency.

It can directly help to expand the opportunity exposed to shrink-wrapping at later phase because it causes the use of CSR is delayed to the BB with the call instead of entry BB.It's trying to avoid the magic cost number for different target. The CSRCost is a relative times corresponding to the spill operation in prologue/epilogue.

Diff Detail

Event Timeline

wuzish created this revision.Thu, Aug 22, 1:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Aug 22, 1:13 AM

This is a WIP draft and I have not fixed all check-all cases now. I want to get some quick feedback whether it's on the right direction and whether its performance is good at all different target. If you have time, could you please review it or give some comments, and help me to verify the performance at other targets?

xbolva00 added a subscriber: xbolva00.EditedThu, Aug 22, 1:24 AM

Can you add tests for atleast x86/arm and precommit them + rebase?

dexonsmith resigned from this revision.Sun, Aug 25, 8:17 AM
wuzish updated this revision to Diff 217086.Mon, Aug 26, 1:22 AM
wuzish edited the summary of this revision. (Show Details)
wuzish added a reviewer: xbolva00.

Enable it for AARCH64 and x86, and address testcases.

Could anybody run some bmk to see the performance result of AARCH64 and X86?
I have run on PowerPC, it has good influence in some bmks of spec2017 and no obvious degression.

Can the patch description please be updated to contain a dumbed down explanation of the issue this is solving?

+ @dmgreen for arm benchmarks

wuzish edited the summary of this revision. (Show Details)Mon, Sep 2, 1:05 AM
wuzish added a reviewer: dmgreen.
wuzish added a reviewer: Restricted Project.Mon, Sep 2, 10:16 PM
wuzish added a subscriber: NeHuang.Thu, Sep 5, 7:28 PM

gentle pin...

.AMDGPU also override the getCSRFirstUseCost() but your patch didn't catch that. And would you please post some improve number for powerpc of this patch ?

llvm/test/CodeGen/PowerPC/tail-dup-break-cfg.ll
1

I suggest that, you should commit a NFC patch to update the CHCK first.