This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Allow sinking defs with multiple uses in PreRARemterialize scheduling stage
AbandonedPublic

Authored by vangthao on Apr 12 2022, 10:08 AM.

Details

Summary

In further attempts to reduce RP and increase occupancy in the
PreRARematerialize stage, we can widen the criteria for sinking trivially
rematerializable defs to include defs with multiple uses and sink a copy
directly to each use if occupancy would be improved in doing so.

Diff Detail

Event Timeline

vangthao created this revision.Apr 12 2022, 10:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 10:08 AM
vangthao requested review of this revision.Apr 12 2022, 10:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 10:08 AM

Is there a real usecase? I do not like scheduler going that way.

Is there a real usecase? I do not like scheduler going that way.

This fixes the regression in SWDEV-316487. I agree that this is making the scheduler too complex. We really need to a way to calculate register pressure before hoisting trivially rematerializable defs in MachineLICM or make this its own pass.

Is there a real usecase? I do not like scheduler going that way.

This fixes the regression in SWDEV-316487. I agree that this is making the scheduler too complex. We really need to a way to calculate register pressure before hoisting trivially rematerializable defs in MachineLICM or make this its own pass.

Is that still a problem? Wasn't it fixed by the first commit?

Is there a real usecase? I do not like scheduler going that way.

This fixes the regression in SWDEV-316487. I agree that this is making the scheduler too complex. We really need to a way to calculate register pressure before hoisting trivially rematerializable defs in MachineLICM or make this its own pass.

Is that still a problem? Wasn't it fixed by the first commit?

It is still an issue. We are not able to collect enough trivially rematerializable defs with just single def/single use instructions. Multiple defs are hoisted and then eliminated due to being redundant thus increasing their use count. In another case, MachineLICM hoisted parts of a reg sequence and we are unable to sink them back down due being part of a subreg. This causes an increase in overall register pressure throughout the loop and decreases occupancy.

vangthao abandoned this revision.Apr 29 2022, 9:54 AM

Looking to see if we can fix this issue in register allocation as suggested by @kerbowa