This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SIInsertWaitcnts] Initialize the WaitcntBrackets for non-kernel functions
AbandonedPublic

Authored by jmmartinez on Jul 31 2023, 4:48 AM.

Details

Reviewers
foad
rampitec
Summary

Currently 0 is assumed as the initial value for the counters,
and waitcnt instructions are inserted at the begining of non-entry functions to enforce this.

This patch initializes the counters to their maximum value instead of 0.

Diff Detail

Event Timeline

jmmartinez created this revision.Jul 31 2023, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 4:48 AM
jmmartinez requested review of this revision.Jul 31 2023, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 4:48 AM
jmmartinez added inline comments.Jul 31 2023, 4:50 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1866–1868

I'd rather move this initialization into WaitcntBrackets constructor.

foad added a comment.Jul 31 2023, 5:05 AM

Please explain what this is for. Is it a bug fix? Do you have a test that shows the intended effect?

Please explain what this is for. Is it a bug fix? Do you have a test that shows the intended effect?

Hello,

The test that best depicts the issue is amd.endpgm.ll.

In this test there is a test1 non-kernel function that inmediately does a call to @llvm.amdgcn.endpgm. On GFX11, I'd expect a s_sendmsg instruction before the s_endpgm for that function.

SIInsertWaitcnt inserts an s_sendmsg before each s_endpgm if the VS_CNT score is not 0 and if there is any pending scratch store operation; but when SIInsertWaitcnt initializes the WaitcntBrackets object for any function (entry or not) all the counters are set to 0.

foad added a comment.Jul 31 2023, 6:07 AM

Please explain what this is for. Is it a bug fix? Do you have a test that shows the intended effect?

Hello,

The test that best depicts the issue is amd.endpgm.ll.

In this test there is a test1 non-kernel function that inmediately does a call to @llvm.amdgcn.endpgm. On GFX11, I'd expect a s_sendmsg instruction before the s_endpgm for that function.

SIInsertWaitcnt inserts an s_sendmsg before each s_endpgm if the VS_CNT score is not 0 and if there is any pending scratch store operation; but when SIInsertWaitcnt initializes the WaitcntBrackets object for any function (entry or not) all the counters are set to 0.

Thanks for the explanation. A simpler implementation of this would just set VS_CNT to its max value, not all the counters.

However you can only insert the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) if there are no pending scratch stores (see D153295) so I don't think this patch is acceptable.

Please explain what this is for. Is it a bug fix? Do you have a test that shows the intended effect?

Hello,

The test that best depicts the issue is amd.endpgm.ll.

In this test there is a test1 non-kernel function that inmediately does a call to @llvm.amdgcn.endpgm. On GFX11, I'd expect a s_sendmsg instruction before the s_endpgm for that function.

SIInsertWaitcnt inserts an s_sendmsg before each s_endpgm if the VS_CNT score is not 0 and if there is any pending scratch store operation; but when SIInsertWaitcnt initializes the WaitcntBrackets object for any function (entry or not) all the counters are set to 0.

Thanks for the explanation. A simpler implementation of this would just set VS_CNT to its max value, not all the counters.

However you can only insert the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) if there are no pending scratch stores (see D153295) so I don't think this patch is acceptable.

Argh ! I miss-read the code. Sorry for that.

There is still something strange in how the WaitcntBrackets object is initialized that I find strange: It seems that the counters and the pending events are assumed to be 0 at the entry of non-kernel functions, which looks wrong to me. Am I right? Or I'm missinterpreting how WaitcntBrackets works (or how the calling convention works)?

foad added inline comments.Jul 31 2023, 6:28 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1863

For non-kernel functions we emit s_waitcnt 0 here so we know the counters are 0.

jmmartinez abandoned this revision.Jul 31 2023, 7:23 AM
jmmartinez added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1863

I've seen this code.

If the code initialized WaitcntBrackets to something else than 0 this would still be considered when it parses the s_waitcnt instruction.

My issue with WaitcntBrackets being always initialized to 0 is that it assumes a matching waitcnt that sets all the counters to 0 and flushes any pending events is inserted at function entry (and it seems it's not the case for VS_CNT).

Anyways, I'm abandonning this revision since it's not valid,

foad added inline comments.Jul 31 2023, 7:27 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1863

About "it's not the case for VS_CNT": you are correct, but this pass mostly ignores VS_CNT anyway. A wait for VS_CNT is never inserted to resolve a data dependency, like the other counters. It is pretty much only used to decide whether to insert the dealloc vgprs message.