Page MenuHomePhabricator

[AMDGPU] Remove dubious logic in bidirectional list scheduler
ClosedPublic

Authored by foad on Oct 2 2019, 7:29 AM.

Details

Summary

pickNodeBidirectional tried to compare the best top candidate and the
best bottom candidate by examining TopCand.Reason and BotCand.Reason.
This is unsound because, after calling pickNodeFromQueue, Cand.Reason
does not reflect the most important reason why Cand was chosen. Rather
it reflects the most recent reason why it beat some other potential
candidate, which could have been for some low priority tie breaker
reason.

I have seen this cause problems where TopCand is a good candidate, but
because TopCand.Reason is ORDER (which is very low priority) it is
repeatedly ignored in favour of a mediocre BotCand. This is not how
bidirectional scheduling is supposed to work.

To fix this I changed the code to always compare TopCand and BotCand
directly, like the generic implementation of pickNodeBidirectional does.
This removes some uncommented AMDGPU-specific logic; if this logic turns
out to be important then perhaps it could be moved into an override of
tryCandidate instead.

Graphics shader benchmarking on gfx10 shows a lot more positive than
negative effects from this change.

Diff Detail

Event Timeline

foad created this revision.Oct 2 2019, 7:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2019, 7:30 AM
foad added a comment.Oct 2 2019, 7:31 AM

There is quite a lot of fallout in the test suite which I have not addressed yet. I wanted to get approval for the idea first.

It would be nice to have a test for this.

Sorry, I missed your comment about tests.

However it would be nice to have a test that shows benefit. I like the reasoning in the description of this change.

arsenm added inline comments.Oct 2 2019, 9:25 AM
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
237–239

The logic over here looks pretty convoluted to me

I tend to agree with the reasoning. We can only judge after benchmarking though. You say you see more gains than loses. Can you elaborate on a magnitude of those?

I guess this is just making the custom scheduling logic more closely match the generic scheduler? Makes sense to me.

foad marked 2 inline comments as done.Oct 3 2019, 6:03 AM

I tend to agree with the reasoning. We can only judge after benchmarking though. You say you see more gains than loses. Can you elaborate on a magnitude of those?

I benchmarked on gfx10 with 324 shaders from 14 different games, which I don't think I can share publically.
10 of them slowed down by more than 2%. The worst slow-down was 8.5%.
38 of them sped up by more than 2%. The best speed-up was 45%.

In addition to the overall speed-up, my hope is that it will behave a bit more consistently, because it won't be able to get stuck in the odd state where a good candidate is consistently ignored.

I guess this is just making the custom scheduling logic more closely match the generic scheduler? Makes sense to me.

Yes.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
237–239

The old logic or the new logic? The new logic (like most of this function) is cut n pasted from the same function in GenericScheduler, and is dictated by the rather obscure API of tryCandidate. As I understand it, tryCandidate(Cand, TryCand) means "if TryCand is better than Cand then set TryCand.Reason to the reason why".

248

@tstellar do you know or remember the reasons for all of this logic that looks at the RPDeltas?

I tend to agree with the reasoning. We can only judge after benchmarking though. You say you see more gains than loses. Can you elaborate on a magnitude of those?

I benchmarked on gfx10 with 324 shaders from 14 different games, which I don't think I can share publically.
10 of them slowed down by more than 2%. The worst slow-down was 8.5%.
38 of them sped up by more than 2%. The best speed-up was 45%.

In addition to the overall speed-up, my hope is that it will behave a bit more consistently, because it won't be able to get stuck in the odd state where a good candidate is consistently ignored.

OK, that sounds reasonable.

foad updated this revision to Diff 224344.Oct 10 2019, 7:59 AM

Update tests:

  • some I've made strictly more lenient by adding regular expressions or adding -DAG or removing -NEXT
  • some have other manual updates
  • some were automatically updated
  • I also included D68563
foad added a comment.Oct 10 2019, 8:02 AM

Reviewers: any advice on handling lots of test updates like this? I could pre-commit some of the tests, where I've made them strictly more lenient. I could also add -enable-misched=false to any tests that aren't specifically testing the scheduler, update them and pre-commit that, in order to protect them from this and future scheduler tweaks.

Given the numbers I tend to agree with the change.

Reviewers: any advice on handling lots of test updates like this? I could pre-commit some of the tests, where I've made them strictly more lenient. I could also add -enable-misched=false to any tests that aren't specifically testing the scheduler, update them and pre-commit that, in order to protect them from this and future scheduler tweaks.

I usually prefer to precommit non-essential test changes. It is less merges and smaller review.

foad updated this revision to Diff 247097.Feb 27 2020, 1:35 PM

Rebase after precommitting some test updates.

rampitec added inline comments.Feb 27 2020, 1:53 PM
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
252

Do we check for RP limits anywhere now?

foad marked an inline comment as done.Feb 28 2020, 2:28 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
252

Yes, we call GenericScheduler::tryCandidate which does this:

// Avoid exceeding the target's limit.
if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.Excess,
                                             Cand.RPDelta.Excess,
                                             TryCand, Cand, RegExcess, TRI,
                                             DAG->MF))
  return;

// Avoid increasing the max critical pressure in the scheduled region.
if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CriticalMax,
                                             Cand.RPDelta.CriticalMax,
                                             TryCand, Cand, RegCritical, TRI,
                                             DAG->MF))
  return;
asbirlea removed a subscriber: asbirlea.Feb 28 2020, 9:41 AM
rampitec accepted this revision.Feb 28 2020, 12:04 PM

LGTM given the measurements.

This revision is now accepted and ready to land.Feb 28 2020, 12:04 PM
This revision was automatically updated to reflect the committed changes.