This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions
ClosedPublic

Authored by vangthao on Mar 25 2022, 12:35 PM.

Details

Summary

Enable the PreRARematerialize pass when there are multiple high RP scheduling
regions present. Require the occupancy in all high RP regions be improved
before finalizing sinking. If any high RP region did not improve in occupancy
then un-do all sinking and restore the state to before the pass.

Diff Detail

Event Timeline

vangthao created this revision.Mar 25 2022, 12:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 12:35 PM
vangthao requested review of this revision.Mar 25 2022, 12:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 12:35 PM
rampitec added inline comments.Mar 29 2022, 2:45 PM
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
658

Fix the formatting as tidy suggests.

731

Unrelated to this patch. Can you separately change this to hasOneNonDBGUse()?

739

This also needs a nodbg use (use_nodbg_instructions).

753

It has one use, so you can break from the loop here.

777

This is expensive to make a full copy of everything, even though you only need a small number of regions.

783

It is better to have an inverted logic, initialize it to false and only set to true if actually improved anything.

852

Has this *Reg* as a live-in.

870

Don't you need to remove Reg from NewLiveIns at this point for all regions?

903

It is probably also possible to call updateRegionBoundaries to restore the state.

904

return false;

933

Actually ++MinOccupancy suggests 1 step improvement. It can be higher, but you do not really know as you would need to check all other blocks too.

935

return true;

vangthao updated this revision to Diff 419598.Mar 31 2022, 6:32 PM
vangthao marked 5 inline comments as done.

Fix formatting and comments.
Reduce required copies of cached Pressure and Liveins.
Update Pressure and Liveins for all regions where the reg is live-in when sinking.

vangthao added inline comments.Mar 31 2022, 6:44 PM
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
739

If there is a debug use before the first nondbg use, this will sink the MI after the debug use.

753

We also want to collect the def if it is live-through. If we only collect uses, we may skip a trivially rematerializable def that is used in a low RP region but is increasing RP for a high RP region due to being live-through.

870

Updated to also remove Reg from all regions' LiveIns set and re-calculate RP for regions where the Reg was live-in including regions not at MinOccupancy. This should keep cached Pressure and LiveIns up to date after sinking.

903

I am not sure if we can call updateRegionBoundaries. If we sink two defs and both regions are now empty, it will not be possible to tell which def belonged to which region because both will point to the end of block iterator.

rampitec added inline comments.Apr 1 2022, 11:44 AM
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
739

The point is: codegen with and without debug info shall not differ. With this code it will.

783

There can be a lot of regions, I'd better switch to a map indexed by Idx rather than live huge unused chunks of vectors.

993

Return the newline.

vangthao updated this revision to Diff 420289.Apr 4 2022, 1:13 PM

Rebase and address review comments.

vangthao marked 4 inline comments as done.Apr 4 2022, 1:17 PM
vangthao added inline comments.
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
870

Actually we also need to update the region where the def was sunk from so cached Pressure is not up to date. Added a FIXME.

rampitec accepted this revision.Apr 4 2022, 4:37 PM

LGTM

This revision is now accepted and ready to land.Apr 4 2022, 4:37 PM
This revision was landed with ongoing or failed builds.Apr 8 2022, 1:08 PM
This revision was automatically updated to reflect the committed changes.