This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix DGEMM hazard for GFX90a
ClosedPublic

Authored by vangthao on Jul 27 2022, 5:54 PM.

Details

Summary

For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert
wait-states to avoid data hazard. However when there is a DGEMM instruction
in-between them, SQ incorrectly disables the wait-states thus the data hazard
needs to be handled with this workaround.

Diff Detail

Event Timeline

vangthao created this revision.Jul 27 2022, 5:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 5:54 PM
vangthao requested review of this revision.Jul 27 2022, 5:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 5:54 PM
vangthao updated this revision to Diff 448222.Jul 27 2022, 5:56 PM

Fix test name.

rampitec added inline comments.Jul 28 2022, 12:09 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

I do not believe it deserves a feature bit. These bits are limited.

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2582

There can be a block with fallthrough, and the next block may start with DGEMM. This check for MBB->end() does not work in this case.

2588

There is already a loop across uses above at line 2317, you could just add the check there.

vangthao added inline comments.Jul 28 2022, 12:16 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

Is there a way to check for gfx90a only? I thought about hasGFX90AInsts() but this also includes gfx940. We could exclude gfx940 with !hasGFX940Insts() but that does not seem like a clean way of doing so.

kerbowa added inline comments.Jul 28 2022, 12:22 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

Isn't the max number of bits 256? So we are at 143/256? Do we have a good way of checking for a specific processor beyond adding feature bits for each one?

rampitec added inline comments.Jul 28 2022, 12:28 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

Right. Imagine we are using a bit for every HW bug. In this case a check would be hasGFX90Insts() && !hasGFX940Insts().

vangthao added inline comments.Jul 28 2022, 1:10 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

Can we use AMDGPU::IsaVersion to check for gfx90a or would hasGFX90AInsts() && !hasGFX940Insts() be better? The latter does not seem future proof if more subtargets would be added that include gfx90a but not gfx940.

rampitec added inline comments.Jul 28 2022, 1:45 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
282 ↗(On Diff #448222)

Either way works.

vangthao updated this revision to Diff 448671.Jul 29 2022, 10:06 AM

Remove feature bit and check hasGFX90AInsts() && !hasGFX940Insts() instead. Combine check with previous loop. Check for fallthrough blocks and added tests for fallthrough blocks.

rampitec added inline comments.Jul 29 2022, 1:38 PM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2337

Move it one line above, MaxWaitStates is always last.

2364

There is already Reg variable available here.

2364

Hoist MRI initialization out of the loop.

2373

This is an overkill and way insufficient at the same time (what if an instruction is a meta/debug?).

getWaitStatesSince will traverse instructions backwards one by one and take care about all of that. It will either see DGEMM before the offending VALU or not. In fact since it is only 2 waitstates DGEMM must be the first instruction it will see, otherwise you can immediately bail. To bail immediately you could use custom IsExpired function, or just capture a variable to detect a DGEMM. Whatever you prefer.

llvm/lib/Target/AMDGPU/GCNSubtarget.h
194 ↗(On Diff #448671)

Unused.

rampitec added inline comments.Jul 29 2022, 1:43 PM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2366

In fact you are using getWaitStatesSinceDef, so all you you need to check there is a DGEMM on path. Instead of the whole this function you could just return isDGEMM(MI.getOpcode()). No even need in a custom IsExpired or capturing anything.

vangthao updated this revision to Diff 448743.Jul 29 2022, 4:47 PM

Hoisted MRI and hazard check function out of loop. Simplififed hazard check to only keep track if we saw a DGEMM instruction on reverse traversal path to reg def.

vangthao marked 4 inline comments as done.Jul 29 2022, 4:52 PM
vangthao added inline comments.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2366

I believe getWaitStatesSinceDef also includes a check if the MI modifies the reg. Since the DGEMM instruction itself does not modify the reg, it will never return as a hazard. I have simplified the hazard check to include if we saw a DGEMM and if reg is defined by VALU.

This revision is now accepted and ready to land.Aug 1 2022, 10:39 AM
This revision was landed with ongoing or failed builds.Aug 1 2022, 11:59 AM
This revision was automatically updated to reflect the committed changes.