Page MenuHomePhabricator

AMDGPU/InsertWaitcnts: Simplify pending events tracking
ClosedPublic

Authored by nhaehnle on Nov 7 2018, 2:16 PM.

Details

Summary

Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.

This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle created this revision.Nov 7 2018, 2:16 PM
t-tye added a comment.Nov 7 2018, 3:56 PM

This is sufficient, because whenever only one event of a count type is

pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

This is sufficient, because whenever only one event of a count type is

pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

Hmm, how would that work? What lowest count are you referring to? For example, if lgkm has both in-flight SMEM read, and in-flight LDS, we could either have all SMEM read finish first or all LDS finish first.

Something that we could do is a more finely-grained tracking of in-order events. For example, if we have both in-flight SMEM and in-flight LDS, and we need to wait for the second-to-last LDS, then in fact we could do an lgkmcnt(1) wait -- because if the counter reaches 1 or less, the second-to-last LDS must have returned. After the lgkmcnt(1), we still need to conservatively assume that any event type that was previously in-flight may still be in-flight, so this patch here is compatible with such a more finely-grained tracking.

I think the finer-grained tracking could be achieved by introducing separate timelines for each event type: currently we only have timelines by counter. Anyway, it'd be a separate change, mainly for the benefit of mixing LDS and SMEM I think.

t-tye added a comment.Nov 9 2018, 9:25 AM

This is sufficient, because whenever only one event of a count type is

pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

Just wondered if can do better than using 0. Instead can the lowest count be used as this should be sufficient to ensure all out-of-order events in this have happened? I had discussed this with Bob at one time.

Hmm, how would that work? What lowest count are you referring to? For example, if lgkm has both in-flight SMEM read, and in-flight LDS, we could either have all SMEM read finish first or all LDS finish first.

Something that we could do is a more finely-grained tracking of in-order events. For example, if we have both in-flight SMEM and in-flight LDS, and we need to wait for the second-to-last LDS, then in fact we could do an lgkmcnt(1) wait -- because if the counter reaches 1 or less, the second-to-last LDS must have returned. After the lgkmcnt(1), we still need to conservatively assume that any event type that was previously in-flight may still be in-flight, so this patch here is compatible with such a more finely-grained tracking.

I think the finer-grained tracking could be achieved by introducing separate timelines for each event type: currently we only have timelines by counter. Anyway, it'd be a separate change, mainly for the benefit of mixing LDS and SMEM I think.

Right that was what I was meaning. Even though some waitcnt counters (such as lgkm) are an amalgam of several other internal counters, some of those internal counters are still in order. So tracking the internal counters avoids having to do the overly conservative use of 0.

It can also mean that some waitcnts may be eliminated since it may be that it is known there are no outstanding operations associated with an internal counter due to a previous waitcnt plus the fact that no operations for that internal counter have occurred subsequently.

Ping?

I think the remarks by @t-tye point to a potentially useful optimization, but that should not be part of this patch.

t-tye added a comment.Nov 19 2018, 8:27 AM

Ping?

I think the remarks by @t-tye point to a potentially useful optimization, but that should not be part of this patch.

Agree that optimization could be done as a separate patch. Would a Fixme comment be a good idea to record the idea in the appropriate place?

nhaehnle updated this revision to Diff 175041.Nov 22 2018, 5:16 AM
  • add TODO comment about the opportunity of keeping per-event timelines
This revision is now accepted and ready to land.Nov 22 2018, 9:37 AM
This revision was automatically updated to reflect the committed changes.