This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Small cleanups in wait counter code
ClosedPublic

Authored by stepthomas on Oct 25 2022, 5:51 AM.

Details

Reviewers
foad
Group Reviewers
Restricted Project
Commits
rGc8a90316fa75: [AMDGPU] Small cleanups in wait counter code
Summary

A small number of cleanups and refactors intended to enhance readability in
two passes that deal with s_waitcnt instructions.

Change-Id: Iff58371f659e8648b695868af009c5613de0e213

Diff Detail

Event Timeline

stepthomas created this revision.Oct 25 2022, 5:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:51 AM
stepthomas requested review of this revision.Oct 25 2022, 5:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:51 AM
foad added a comment.Oct 25 2022, 6:14 AM

Note that getScoreLB should always be <= getScoreUB, so code like getScoreLB < getScoreUB can be updated to getScoreRange != 0.

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
1335

Not sure about the name. Maybe isWaitStoreCountZero would be better...?

1336

Don't need the outermost parens.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
703

Can use getScoreRange here.

764–765

Can use getScoreRange here.

1206–1207

Can use getScoreRange here.

1598–1599

Can use getScoreRange here.

Address review comments

Rename isStoreCountWait() -> isStoreCountWaitZero(), use getScoreRange()
in more places.

Thank you for the cleanup! I left some suggestions, but I am also OK with everything as-is

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
250–251

If you did decide to add the hasPendingEvent overload I would suggest renaming this as well. All of these have the same semantics, just with different "filters", so I think they make sense as an overload set.

311–317

Some more thoughts while reviewing (although not applicable to your changes directly):

  • Why is EXP_CNT special here?
  • Is there a particular reason this relies on unsigned integer underflow? Does this map better onto the conceptual model of the scores/counts?

For either/both can anyone devise a comment to clarify? I suggested what I believe to be an equivalent version which I find easier to understand, but I am likely missing something.

704–721

Nit, but I'd also factor out the remaining common stuff

1228

I would suggest also adding a hasPendingEvent(InstCounterType T) const overload to use here, i.e.

bool hasPendingEvent(InstCounterType T) const {
  bool HasPending = PendingEvents & WaitEventMaskForInst[T];
  assert(HasPending == getScoreUB(T) > getScoreLB(T));
  return HasPending;
}
1243
1598–1599
stepthomas added inline comments.Oct 26 2022, 4:00 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
704–721

I'll decline this suggestion if I may - it actually makes it less clear to me.

1243

As I was typing this (and other cases) I was very much in two minds. I think I'll adopt this suggestion.

stepthomas added inline comments.Oct 26 2022, 4:05 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1228

I'll add this suggestion, including renaming the void parameter overloading, although I'll also change the return type to unsigned (and the void parameter overloading simply returns PendingEvents) because then I can use the new overloading in hasMixedPendingEvents() immediately blow this code (so getting an extra assertion check). It makes the assertion

assert((HasPending != 0) == (getScoreRange(T) != 0))

which is ugly, but still worth it.

Address more review comments

foad added inline comments.Oct 26 2022, 8:37 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
311–317

I don't know why EXP_CNT is handled specially, but I agree your version of the code is simpler and easier to understand.

1206–1207

You're just testing getScoreRange != 0 here, so you could use the new hasPendingEvent overload (I'm not really sure why we have both and which you should prefer)

1598–1599

I would have thought that hasPendingEvent(SMEM_ACCESS) implies hasPendingEvent(LGKM_CNT) - doesn't it?

scott.linder added inline comments.Oct 26 2022, 10:01 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
704–721

Works for me, I agree it is nice to see the whole "token" for each case in one line

scott.linder added inline comments.Oct 26 2022, 10:15 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1598–1599

+1, the redundant check for a pending LGKM_CNT event should be removed (or I suppose moved to an assert if it makes the intent wrt. the bug clearer)

Rebase and address more review comments.

Apply a suggested change to setScoreUB() that I'd somehow missed before,
and remove a couple of redundant checks.

foad accepted this revision.Oct 28 2022, 2:41 AM

LGTM, thanks!

This revision is now accepted and ready to land.Oct 28 2022, 2:41 AM
This revision was landed with ongoing or failed builds.Oct 28 2022, 3:05 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Oct 28 2022, 4:06 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
311–317

I did look into this a bit more, and it's to do with whether we still need to emit any wait for a counter that has overflowed. See D70418. But I still don't know why EXP_CNT is treated specially.