This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already
AbandonedPublic

Authored by jmmartinez on Jul 31 2023, 5:21 AM.

Details

Summary

Currently a WIP, I've only updated a single test to depict the change (there are ~50 tests that need update).

Diff Detail

Event Timeline

jmmartinez created this revision.Jul 31 2023, 5:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 5:21 AM
jmmartinez requested review of this revision.Jul 31 2023, 5:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 5:21 AM
  • Rebase over main
jmmartinez added inline comments.Jul 31 2023, 8:01 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_uinc_wrap.ll
71

TODO: check if the s_waitcnt_vscnt is removed when the function is a non-entry function.

This needs a wider discussion. The current philosophy in SIInsertWaitcnts is that it will never remove a pre-existing waitcnt, since it could have been added by the user to work around a hardware bug in ways that the hardware does not understand. We could change that philosophy but it would need wider agreement.

The secondary problem is that SIInsertWaitcnts can't distinguish a waitcnt added by the user from one added by SIMemoryLegalizer. It would be great if SIInsertWaitcnts could remove the latter.

Adding @stepthomas for awareness.

We need to be completely sure that the removed waitcnt added by the memory legalizer is a noop and shall not be impacted later in the pass by hoisted waitcnt or merged brackets. As is this patch may cause unexpected and hard to debug errors. In principle, it should be fine to remove waitcnt if we can be sure that the counter is guaranteed to be 0.

jmmartinez updated this revision to Diff 547793.Aug 7 2023, 7:44 AM
  • Fixed initialization of WaitcntBrackets structure for non-kernel functions (since the counters were initialized to 0, the inserted s_waitcnt 0 at function-entry was removed)
  • Updated tests to help spot potential issues with the patch:
    • One issue is that some memory-fence tests with a kernel doing only a memory fence become irrelevant since the wait instructions are removed. Adding some memory accesses to those would help.
  • Added CodeGen/AMDGPU/preserve-user-waitcnt.ll to test how user-inserted waits are handled. Currently, with __builtin_amdgcn_s_waitcnt it is possible that the waitcnt instruction is going to be removed

Thanks for the comments @foad & @kerbowa !

I agree that this transformation would need wider agreement (and more testing than our ci pipeline!).

I'm currently trying to find a way to distinguish between waitcnt instructions coming from the memory-legalizer and the rest.
But I'm far less familiar with the MachineInstruction API than with LLVM-IR.
Is there something like Metadata or Flags that we can turn on/off to indiciate that an instruction is coming from the legalizer?

foad added a comment.Aug 7 2023, 7:51 AM

I'm currently trying to find a way to distinguish between waitcnt instructions coming from the memory-legalizer and the rest.
But I'm far less familiar with the MachineInstruction API than with LLVM-IR.
Is there something like Metadata or Flags that we can turn on/off to indiciate that an instruction is coming from the legalizer?

I think you should probably use different opcodes for this, e.g. invent something like S_SOFT_WAITCNT or S_WAITCNT_PSEUDO for the ones inserted by SIMemoryLegalizer.

  • Added S_SOFT_WAITCNT and S_SOFT_WAITCNT_VSCNT
  • There is some refactoring in between that I should move to another PR
jmmartinez added inline comments.Aug 16 2023, 6:34 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
920

Only simplify the wait for soft waitcnt, otherwise this may relax a hard waitcnt.

  • Do not relax "hard" s_waitcnt instructions
  • Rebased
  • Only 1 test failing: I have to rethink this remaining test consisting mostly of kernels doing only a memory fence. These become just an s_endpgm since there are no memory accesses to fence.
arsenm added inline comments.Aug 18 2023, 5:55 AM
llvm/lib/Target/AMDGPU/SOPInstructions.td
1277–1282

These look too much like a real instruction, should use a lowercase suffix and the same prefix as the real instruction

  • Updated the pseudo instruction names to S_WAITCNT_soft and S_WAITCNT_VSCNT_soft
arsenm added inline comments.Aug 24 2023, 8:12 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8293–8295

This should use PseudoInstExpansion (which we apparently only have 2 uses of, I thought there were more)

llvm/lib/Target/AMDGPU/SIInstrInfo.h
845

don't use optional for this. I would assume this is just invalid to call for a non-waitcnt

865

Document what a soft waitcnt is

jmmartinez marked an inline comment as done.
  • Added a few comments to describe what a soft waitcnt is
  • Take into account reviews
  • Rebased
jmmartinez added inline comments.Aug 25 2023, 2:44 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8293–8295

I'm not sure how to do this.

I've tried with the following, but it seems that result cannot be another pseudo:

SOPInstructions.td:1281:5: error: In pseudo instruction 'S_WAITCNT_soft', result operator 'S_WAITCNT' cannot be a pseudo instruction
def S_WAITCNT_soft : SOPP_Pseudo <"s_soft_waitcnt" , (ins SWaitCnt:$simm16), "$simm16">, 
                                    PseudoInstExpansion<(S_WAITCNT SWaitCnt:$simm16)>;
foad added inline comments.Sep 8 2023, 3:22 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8293–8295

There are a bunch of places where we ant multiple Pseudos to map to the same Real. I'd love to have a good consistent way to implement this.