This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Simplify the exclusive scan used for optimized atomics
ClosedPublic

Authored by foad on Jul 9 2019, 6:13 AM.

Details

Summary

Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

  1. It simplifies the compiler a little.
  2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c.
  3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Diff Detail

Repository
rL LLVM

Event Timeline

foad created this revision.Jul 9 2019, 6:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 6:13 AM
foad added a comment.Jul 9 2019, 6:16 AM

See also comments here about whether to include the shift by 3: https://reviews.llvm.org/D57737#1392824

arsenm accepted this revision.Jul 18 2019, 5:34 PM

LGTM

This revision is now accepted and ready to land.Jul 18 2019, 5:34 PM
This revision was automatically updated to reflect the committed changes.