Page MenuHomePhabricator

[AMDGPU] Use reductions instead of scans in the atomic optimizer

Authored by foad on Mar 19 2021, 7:52 AM.



If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp

Diff Detail

Event Timeline

foad created this revision.Mar 19 2021, 7:52 AM
foad requested review of this revision.Mar 19 2021, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 7:52 AM
b-sumner added inline comments.Mar 19 2021, 11:25 AM

This requires all lanes to be active. Are we guaranteed that the work group size will be a integer multiple of the wave size?

foad added inline comments.Mar 19 2021, 11:36 AM

The reduction or scan runs in whole wave mode. All lanes are active.


... and lanes that weren't active to start with are set to an appropriate identity value for the operation.

b-sumner added inline comments.Mar 19 2021, 12:25 PM

But suppose the launched grid has size 66. That means one wave has only 2 active lanes, and I'm not aware that WWM can actually activate the rest of them.

foad added inline comments.Mar 19 2021, 2:02 PM

That's exactly what WWM does: unconditionally activates all lanes. You can see that in the tests in this patch (both before and after my changes): s_or_saveexec_b64 s[0:1], -1 sets all bits in the exec mask.

critson accepted this revision.Mar 23 2021, 1:14 AM

LGTM - this seems like a good use of GFX10 row_xmask.

Please see minor comments.


Should this V_PERMLANEX16 be guarded as well?
Or at least have an assert?


Is it worth silencing linting of this enum?


ROW_SHARE0 is defined, but not used?
Not that I am against it existing.

This revision is now accepted and ready to land.Mar 23 2021, 1:14 AM
foad updated this revision to Diff 332583.Mar 23 2021, 2:44 AM

Address review comments, and use an early-out.

foad marked 3 inline comments as done.Mar 23 2021, 2:47 AM
foad added inline comments.

Yeah, I just added it for consistency before I had worked out which ones I would actually need.

This revision was automatically updated to reflect the committed changes.
foad marked an inline comment as done.