This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix inserting combined s_nop in bundles
ClosedPublic

Authored by kerbowa on Oct 28 2020, 12:23 PM.

Diff Detail

Event Timeline

kerbowa created this revision.Oct 28 2020, 12:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2020, 12:23 PM
kerbowa requested review of this revision.Oct 28 2020, 12:23 PM
rampitec added inline comments.Oct 28 2020, 12:30 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

Is this expected?

rampitec added inline comments.Oct 28 2020, 12:46 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

According to GCNHazardRecognizer::checkRFEHazards() it needs 1 waitstate.

kerbowa added inline comments.Oct 28 2020, 1:41 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

I don't think this is an RFE hazard.

rampitec added inline comments.Oct 28 2020, 1:50 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

Hm... Right, there is no rfe here...

rampitec added inline comments.Oct 28 2020, 2:02 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

OK, it was inserted by checkReadM0Hazards(). There is a def and use of m0, but it seem to incorrectly process BOUNDLE itself. There shall be 1 waitstate between def and use of m0. PreEmitNoopsCommon() returns 1 from checkReadM0Hazards(), and the IR is:

bb.0 (%ir-block.0):
  successors: %bb.1(0x80000000); %bb.1(100.00%)

  $m0 = S_MOV_B32 0

bb.1:
; predecessors: %bb.1, %bb.0
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  S_SETREG_IMM32_B32 0, 515, implicit-def $mode, implicit $mode
  BUNDLE implicit $m0, implicit $exec {
    DS_GWS_SEMA_V 0, -1, implicit $m0, implicit $exec :: (store 4 into custom "GWSResource")
    S_WAITCNT 0
  }
  renamable $sgpr0 = S_GETREG_B32 515
  S_CMP_LG_U32 killed renamable $sgpr0, 0, implicit-def $scc
  S_CBRANCH_SCC1 %bb.1, implicit killed $scc

There is def of m0 in bb.0, but it is really expired by the GWS operation.

This revision is now accepted and ready to land.Oct 28 2020, 2:02 PM
kerbowa added inline comments.Oct 28 2020, 2:16 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
14

Yes, the redundant nop is a consequence of the order that the regions scheduled. MMB are visited in order but regions are discovered bottom-up within a block. The S_SETREG is not seen by the hazard recognizer until after the bundle.

It seems like we may want to reset the tracked instructions for hazards when starting scheduling on a new region.

This revision was landed with ongoing or failed builds.Oct 28 2020, 2:34 PM
This revision was automatically updated to reflect the committed changes.