This is an archive of the discontinued LLVM Phabricator instance.

[RALLOC] Increase CSR cost in RegAllocGreedy to favour splitting over CSR first use
Needs RevisionPublic

Authored by wmi on Apr 18 2017, 5:02 PM.

Details

Summary

The patch is to solve the problem mentioned in PR29154. It is evolved from https://reviews.llvm.org/D27366.

The patch raises the CSR cost to be 1/4 of the func entry frequency, so that live range will use CSR only when its splitting cost or spill cost is close to or higher than the func entry frequency, otherwise regalloc will choose to split or spill the live range. After the split or spill, live ranges allocated to CSR registers will be shortened and will only be used in cold places, so that we may get better chances to do shrinkwrapping. Another potential benefit is that param passing copy may be sinked from entry to a colder branch.

Internal testing shows it improves several benchmarks, such as zippy (+0.8%), protobuf (+1.9%) and some other microbenchmarks without specified names.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.Apr 18 2017, 5:02 PM
gberry added a subscriber: gberry.Apr 19 2017, 1:09 PM

In general I'm a fan of the change, however, it feels like the heuristic value is somewhat unintelligable on scan. I.e. what's the difference between 1 << 13 and 5 (or any other value)? :)

Is there something we could do to make this easier or more friendly to tune?

Thanks!

-eric

qcolombet requested changes to this revision.Jun 30 2017, 2:23 PM

I agree with Eric, the whole constant choice seems too magic to me.

This revision now requires changes to proceed.Jun 30 2017, 2:23 PM

To be concrete here. I'd like to understand why we can have the regalloc to figure out what is the cost of splitting a CSR the first time (which is what it boils down to).

Gerolf added a subscriber: Gerolf.Jul 6 2017, 3:28 PM

This is interesting. Unfortunately these heuristic changes are always tricky and never satisfying. Is this change motivated entirely by limitations of the current shrink-wrapping algorithm or do you see gains from better allocation also.

-Gerolf

Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2019, 10:35 PM

Should this good patch be rebased and continued to move on? @wmi
it'd be better to move on this work because D27366 is abandoned and D34608 is not updated for a long time, and those 3 patches are all trying to fix same issue and not landed successfully.
It would be great If it can land.

wmi added a comment.Aug 12 2019, 9:33 AM

Should this good patch be rebased and continued to move on? @wmi
it'd be better to move on this work because D27366 is abandoned and D34608 is not updated for a long time, and those 3 patches are all trying to fix same issue and not landed successfully.
It would be great If it can land.

Thanks for bringing it to attention again.

I still think the patch will be very helpful for some cases especially when dynamic profile information is available (when FDO/SampleFDO is used). It is tricky if there is only static profile available. I will reevaluate the patch and see how it works if I change the patch to be functioning only when we have dynamic profile. If you are more interesting on the case when dynamic profile is not available -- i.e., try to make the patch more general, feel free to take over it and I will be happy to join discussion and review.

In D32201#1625379, @wmi wrote:

Should this good patch be rebased and continued to move on? @wmi
it'd be better to move on this work because D27366 is abandoned and D34608 is not updated for a long time, and those 3 patches are all trying to fix same issue and not landed successfully.
It would be great If it can land.

Thanks for bringing it to attention again.

I still think the patch will be very helpful for some cases especially when dynamic profile information is available (when FDO/SampleFDO is used). It is tricky if there is only static profile available. I will reevaluate the patch and see how it works if I change the patch to be functioning only when we have dynamic profile. If you are more interesting on the case when dynamic profile is not available -- i.e., try to make the patch more general, feel free to take over it and I will be happy to join discussion and review.

How about using block frequency instead of dynamic profile information as reference if dynamic profile information is not available? And CSR cost may need to be adjusted to this comment said https://reviews.llvm.org/D27366#617886. For example, separate 2 cost, one for split and one for spill. And make the digital value sensible that if it's related to the target-dependent cost such as the equivalent number of copy which is related to the cycles between copy instruction and load/store instruction.

lkail added a subscriber: lkail.Oct 13 2021, 2:33 AM