This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Introduce Strict WQM mode
ClosedPublic

Authored by piotr on Feb 8 2021, 4:29 AM.

Details

Summary
  • Add amdgcn_strict_wqm intrinsic.
  • Add a corresponding STRICT_WQM machine instruction.
  • The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
  • The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Diff Detail

Event Timeline

piotr created this revision.Feb 8 2021, 4:29 AM
piotr requested review of this revision.Feb 8 2021, 4:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2021, 4:29 AM
critson added inline comments.Feb 8 2021, 7:34 PM
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
14

I am not sure if this is really the clearest description of strict mode and control flow.
Technically strict modes are disabled and reenabled through control flow.
Both strict and non-strict modes "respect" control-flow, but with different semantics.
Perhaps something like:
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict WQM/WWM will always be enabled irrespective of control flow decisions.
Conversely in non-strict WQM inactive lanes may control flow decisions.

206–207

I wonder if these should now just be called toStrictMode and fromStrictMode.

llvm/test/CodeGen/AMDGPU/wqm.ll
433

It's not clear to me how this test proves the "coalesced" part?

500

Might be worth clarify that this exit is in the %if branch.

535

Same comment as previous test.

piotr added inline comments.Feb 9 2021, 5:47 AM
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
14

Thanks, I will incorporate your description in the code.

206–207

Agreed, that would be clearer - I will rename it.

llvm/test/CodeGen/AMDGPU/wqm.ll
433

I patterned this test on the existing one for WWM (test_strictwwm4), which follows the same structure. It seems the comment is wrong.

piotr updated this revision to Diff 326061.Feb 24 2021, 5:44 AM
piotr edited the summary of this revision. (Show Details)

Rebased, changed strictwqm to strict_wqm and updated tests.

piotr edited the summary of this revision. (Show Details)Feb 24 2021, 5:48 AM
piotr added inline comments.
llvm/test/CodeGen/AMDGPU/wqm.ll
433

I looked closely at the original test and I believe the check for "endif" was missing. These two cases make sure that a WWM feeding a phi node is handled properly and the mov/add happens in the same block as WWM computations.

arsenm added inline comments.Feb 24 2021, 6:12 AM
llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
180

Printing MI already includes the newline

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
109

static const, const char*?

piotr updated this revision to Diff 326349.Feb 25 2021, 4:11 AM

Fixed newline and added static/const.

critson accepted this revision.Mar 1 2021, 5:07 PM

LGTM

This revision is now accepted and ready to land.Mar 1 2021, 5:07 PM
This revision was landed with ongoing or failed builds.Mar 3 2021, 5:48 AM
This revision was automatically updated to reflect the committed changes.