This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Force s_waitcnt after GWS instructions
ClosedPublic

Authored by arsenm on Jul 10 2019, 9:35 AM.

Details

Summary

This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.

Diff Detail

Event Timeline

arsenm created this revision.Jul 10 2019, 9:35 AM

Is there any documentation describing it? Also this may be a better job for wait count insertion pass or hazard recognizer, depending on the nature of the issue.

Is there any documentation describing it? Also this may be a better job for wait count insertion pass or hazard recognizer, depending on the nature of the issue.

The documentation for these is really not great. It needs to be the literal next instruction, since it will jump back 8 bytes and retry in some situations. I'm not sure why it needs to be a full wait. I didn't want to rely on some other pass later to insert it, since it must be there. Having the waitcnt already exist also avoids teaching the waitcnt pass about this special case, and instead the general pre-existing waitcnt handling will take care of it.

This revision is now accepted and ready to land.Jul 10 2019, 4:45 PM
arsenm closed this revision.Jul 19 2019, 12:47 PM

r366607

My understanding is that this is mostly related to CWSR. The trap handler has to be able to "replay" the GWS instruction.