For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert
wait-states to avoid data hazard. However when there is a DGEMM instruction
in-between them, SQ incorrectly disables the wait-states thus the data hazard
needs to be handled with this workaround.
Details
- Reviewers
rampitec bcahoon - Commits
- rG7fc52d7c8b11: [AMDGPU] Fix DGEMM hazard for GFX90a
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | I do not believe it deserves a feature bit. These bits are limited. |
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
2582 | There can be a block with fallthrough, and the next block may start with DGEMM. This check for MBB->end() does not work in this case. | |
2588 | There is already a loop across uses above at line 2317, you could just add the check there. |
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | Is there a way to check for gfx90a only? I thought about hasGFX90AInsts() but this also includes gfx940. We could exclude gfx940 with !hasGFX940Insts() but that does not seem like a clean way of doing so. |
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | Isn't the max number of bits 256? So we are at 143/256? Do we have a good way of checking for a specific processor beyond adding feature bits for each one? |
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | Right. Imagine we are using a bit for every HW bug. In this case a check would be hasGFX90Insts() && !hasGFX940Insts(). |
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | Can we use AMDGPU::IsaVersion to check for gfx90a or would hasGFX90AInsts() && !hasGFX940Insts() be better? The latter does not seem future proof if more subtargets would be added that include gfx90a but not gfx940. |
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
282 ↗ | (On Diff #448222) | Either way works. |
Remove feature bit and check hasGFX90AInsts() && !hasGFX940Insts() instead. Combine check with previous loop. Check for fallthrough blocks and added tests for fallthrough blocks.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
2337 | Move it one line above, MaxWaitStates is always last. | |
2364 | There is already Reg variable available here. | |
2364 | Hoist MRI initialization out of the loop. | |
2373 | This is an overkill and way insufficient at the same time (what if an instruction is a meta/debug?). getWaitStatesSince will traverse instructions backwards one by one and take care about all of that. It will either see DGEMM before the offending VALU or not. In fact since it is only 2 waitstates DGEMM must be the first instruction it will see, otherwise you can immediately bail. To bail immediately you could use custom IsExpired function, or just capture a variable to detect a DGEMM. Whatever you prefer. | |
llvm/lib/Target/AMDGPU/GCNSubtarget.h | ||
194 ↗ | (On Diff #448671) | Unused. |
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
2366 | In fact you are using getWaitStatesSinceDef, so all you you need to check there is a DGEMM on path. Instead of the whole this function you could just return isDGEMM(MI.getOpcode()). No even need in a custom IsExpired or capturing anything. |
Hoisted MRI and hazard check function out of loop. Simplififed hazard check to only keep track if we saw a DGEMM instruction on reverse traversal path to reg def.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
2366 | I believe getWaitStatesSinceDef also includes a check if the MI modifies the reg. Since the DGEMM instruction itself does not modify the reg, it will never return as a hazard. I have simplified the hazard check to include if we saw a DGEMM and if reg is defined by VALU. |
Move it one line above, MaxWaitStates is always last.