This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid sorting stalls in regbank-reassign
ClosedPublic

Authored by rampitec on Aug 19 2020, 4:29 PM.

Details

Summary

This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.

Diff Detail

Event Timeline

rampitec created this revision.Aug 19 2020, 4:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2020, 4:29 PM
rampitec requested review of this revision.Aug 19 2020, 4:29 PM
critson added inline comments.Aug 19 2020, 8:52 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
107–129

Is push_front more expensive if we are calling it for every candidate?
Probably just always push_back?

457

Make this an assertion combined with other change to collectCandidate suggested below?

613

Presumably we don't need to do this if we are not using weights at all (i.e. no sorting)?
Perhaps "if (UseWeight)" all of this?

rampitec added inline comments.Aug 20 2020, 12:04 AM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
107–129

It's a list, cost of insertion is O(1).

613

Good point. It is cheap, but still.

rampitec updated this revision to Diff 286721.Aug 20 2020, 12:11 AM
rampitec marked 3 inline comments as done.
rampitec added inline comments.Aug 20 2020, 12:20 AM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
613

Actually since it is cheap it makes sense to keep loop depth weight. The operand forwarding part is expensive, but the sort is not. I will experiment tomorrow and return sort and MLI part, just disable operand scan part.

Hm... On practice that is std::list::sort() takes most of the time. Maybe it is vaible to change the list to a vector of lists, where vector is sorted by equal weights.

rampitec updated this revision to Diff 286917.Aug 20 2020, 4:30 PM
rampitec retitled this revision from [AMDGPU] Disable stall weight calculation in regbank-reassign to [AMDGPU] Avoid sorting stalls in regbank-reassign.
rampitec edited the summary of this revision. (Show Details)

In fact the slowest part was sorting. I just have changed data structure to avoid sorting. Weight calculation itself turns to be not that much expensive.

This almost halves time the pass takes.

This revision is now accepted and ready to land.Aug 21 2020, 12:12 AM
foad added a subscriber: foad.Aug 21 2020, 12:12 AM
This revision was automatically updated to reflect the committed changes.