This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
ClosedPublic

Authored by rampitec on Aug 31 2022, 4:05 PM.

Details

Summary

In this case gfx90a uses v0 instead of the correct register. Swap
the value temporarily with a lower register and then swap it back.

Unfortunately hazard recognizer works after wait count insertion,
so we cannot simply reuse an arbitrary register, hence w/a also
includes a full waitcount. This can be avoided if we run it from
expandPostRAPseudo, but that is a complete misplacement.

Diff Detail

Event Timeline

rampitec created this revision.Aug 31 2022, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 4:05 PM
rampitec requested review of this revision.Aug 31 2022, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 4:05 PM
Herald added a subscriber: wdng. · View Herald Transcript

Also we could still constrain src0 of these to a decimated RC in the finalizeLowering and keep this w/a just in case further optimizations would replace the operand. In that case this will be an extremely rare situation we hit the w/a itself. Although constraining the RC will have its own negative impact on the register allocation.

The other problem each of these new instructions can create a new hazard just by itself. So we probably need to reset the iterator to a first new instruction or run recognizer on each of them.

Alternatively it can be possible to really run this earlier, from post-ra pseudo expansion.

foad added a comment.Sep 1 2022, 2:20 AM

Also we could still constrain src0 of these to a decimated RC in the finalizeLowering and keep this w/a just in case further optimizations would replace the operand.

If you use a different RC for src0 then surely you don't need the workaround? The rest of the compiler should respect the RC, and should not "replace the operand" with a physreg that is not in the right RC.

Also we could still constrain src0 of these to a decimated RC in the finalizeLowering and keep this w/a just in case further optimizations would replace the operand.

If you use a different RC for src0 then surely you don't need the workaround? The rest of the compiler should respect the RC, and should not "replace the operand" with a physreg that is not in the right RC.

That is if I define a whole new set of shifts for this target. This would create a whole new set of problems. If I just constrain existing vgpr32 operand to a decimated RC it will eventually be replaced by some subsequent passes.

foad added a comment.Sep 1 2022, 2:39 AM

That is if I define a whole new set of shifts for this target. This would create a whole new set of problems.

OK, understood. We are doing something similar for 16-bit operands for GFX11, but it is a lot of work.

constrain existing vgpr32 operand to a decimated RC

"octimated", surely? :-) :-)

constrain existing vgpr32 operand to a decimated RC

"octimated", surely? :-) :-)

For sure yes, although list operator is DECIMATE, even if with an 8 as argument.

rampitec added a comment.EditedSep 1 2022, 3:08 AM

One more thing to note: in most of the cases shift amount is an immediate, and even an inline literal. So we need to be really lucky to get here.

rampitec updated this revision to Diff 457378.Sep 1 2022, 1:32 PM

Added run of the recognizer on the newly inserted swaps and tests for it.

kerbowa added inline comments.Sep 6 2022, 9:52 PM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
311

Do you need to reset CurrCycleInstr after calling this function? Should this work with process bundles?

rampitec updated this revision to Diff 458488.Sep 7 2022, 9:55 AM
rampitec marked an inline comment as done.

Fixed work inside a bundle.

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
311

AdvanceCycle() sets CurrCycleInstr to nullptr itself.

Good point about bundle, fixed.

kerbowa accepted this revision.Sep 7 2022, 2:03 PM

LGTM

This revision is now accepted and ready to land.Sep 7 2022, 2:03 PM
This revision was landed with ongoing or failed builds.Sep 7 2022, 2:24 PM
This revision was automatically updated to reflect the committed changes.