Page MenuHomePhabricator

[AMDGPU] Fixed hazard recognizer to walk predecessors

Authored by rampitec on Jan 18 2019, 11:09 AM.



Fixes two problems with GCNHazardRecognizer:

  1. It only scans up to 5 instructions emitted earlier.
  2. It does not take control flow into account. An earlier instruction

from the previous basic block is not necessarily a predecessor.
At the same time a real predecessor block is not scanned.

The patch provides a way to distinguish between scheduler and
hazard recognizer mode. It is OK to work with emitted instructions
in the scheduler because we do not really know what will be emitted
later and its order. However, when pass works as a hazard recognizer
the schedule is already finalized, and we have full access to the
instructions for the whole function, so we can properly traverse
predecessors and their instructions.

Diff Detail


Event Timeline

rampitec created this revision.Jan 18 2019, 11:09 AM
arsenm added inline comments.Jan 18 2019, 12:16 PM
288 ↗(On Diff #182563)

llvm:: not necessary

295–296 ↗(On Diff #182563)


334 ↗(On Diff #182563)

These are used a bunch of places, so could use a typedef

34–35 ↗(On Diff #182563)

I'm surprised this is necessary. To clarify did you somehow only see these problems with -O0? As far as I know we don't run the standalone hazard recognizer when the scheduler is in use, so you should end up with the same issues either way.

rampitec marked an inline comment as done.Jan 18 2019, 12:20 PM
rampitec added inline comments.
34–35 ↗(On Diff #182563)

The problem exists regardless of the optimization level.
We do run standalone recognizer even when scheduler is in use. We add it from GCNPassConfig::addPreEmitPass().
In fact that is wrong to call it regardless of the scheduler, as scheduler sends regions in a bottom-up order. To ensure all hazards are correctly checked a target must run standalone recognizer, there is even a comment there about it.

rampitec updated this revision to Diff 182587.Jan 18 2019, 12:38 PM
rampitec marked 3 inline comments as done.

Addressed review comments.

rampitec marked an inline comment as done.Jan 18 2019, 12:38 PM
rampitec marked an inline comment as done.Jan 19 2019, 12:17 AM
rampitec added inline comments.
34–35 ↗(On Diff #182563)

It is wrong NOT to call it of course.

arsenm accepted this revision.Jan 19 2019, 2:27 PM


This revision is now accepted and ready to land.Jan 19 2019, 2:27 PM
vpykhtin accepted this revision.Jan 21 2019, 9:02 AM

LGTM, thanks!

This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Apr 27 2021, 9:09 AM
foad added inline comments.

I think this early return is broken, because there might be another predecessor that has a smaller value of MinWaitStates which would not satisfy the IsExpired test. Do you agree? (Sorry for reopening such an old review.)

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 9:09 AM
Herald added a subscriber: kerbowa. · View Herald Transcript
rampitec added inline comments.Apr 27 2021, 11:22 AM

Hm... Probably. Did you try to exploit it?

critson added inline comments.Apr 27 2021, 7:27 PM

I do not think it is broken per-say but definitely confusing.
This allows the IsExpired function to trigger an early exit given a specific waitstate count.
The early-exit probe is detectable as it occurs with the MachineInstr* set to null.
And hence avoidable by return false if !MI.

Looking at all the IsExpired implementations, none of them use this functionality (or at least not correctly).
So I'll prepare a patch to remove it and tidy up.

critson added inline comments.Apr 28 2021, 1:10 AM

See D101430.