Currently 0 is assumed as the initial value for the counters,
and waitcnt instructions are inserted at the begining of non-entry functions to enforce this.
This patch initializes the counters to their maximum value instead of 0.
Paths
| Differential D156671
[AMDGPU][SIInsertWaitcnts] Initialize the WaitcntBrackets for non-kernel functions AbandonedPublic Authored by jmmartinez on Jul 31 2023, 4:48 AM.
Details
Diff Detail
Event Timeline
Comment Actions Please explain what this is for. Is it a bug fix? Do you have a test that shows the intended effect? Comment Actions
Hello, The test that best depicts the issue is amd.endpgm.ll. In this test there is a test1 non-kernel function that inmediately does a call to @llvm.amdgcn.endpgm. On GFX11, I'd expect a s_sendmsg instruction before the s_endpgm for that function. SIInsertWaitcnt inserts an s_sendmsg before each s_endpgm if the VS_CNT score is not 0 and if there is any pending scratch store operation; but when SIInsertWaitcnt initializes the WaitcntBrackets object for any function (entry or not) all the counters are set to 0. Comment Actions
Thanks for the explanation. A simpler implementation of this would just set VS_CNT to its max value, not all the counters. However you can only insert the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) if there are no pending scratch stores (see D153295) so I don't think this patch is acceptable. Comment Actions
Argh ! I miss-read the code. Sorry for that. There is still something strange in how the WaitcntBrackets object is initialized that I find strange: It seems that the counters and the pending events are assumed to be 0 at the entry of non-kernel functions, which looks wrong to me. Am I right? Or I'm missinterpreting how WaitcntBrackets works (or how the calling convention works)?
jmmartinez added inline comments.
Revision Contents
Diff 545595 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
llvm/test/CodeGen/AMDGPU/amd.endpgm.ll
llvm/test/CodeGen/AMDGPU/back-off-barrier-subtarget-feature.ll
llvm/test/CodeGen/AMDGPU/waitcnt-overflow.mir
|
For non-kernel functions we emit s_waitcnt 0 here so we know the counters are 0.