This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable WQM if demotes and softwqm are combined
AbandonedPublic

Authored by critson on May 4 2022, 10:56 PM.

Details

Reviewers
foad
ruiling
piotr
Summary

Demotes may be used to explicitly create helper invocations.
These helper invocations may be intend to have observable effects
in WQM, e.g. in fragment shader subgroup operations.
Facilitate this behaviour by forcing softwqm operations to be run
in WQM when demotes are present in a shader.
Conversely this allows such operations to be marked softwqm so
helper lanes are only enabled if demotes are present.

Diff Detail

Event Timeline

critson created this revision.May 4 2022, 10:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 10:56 PM
critson requested review of this revision.May 4 2022, 10:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 10:56 PM

The demote itself does not need to enable wqm. So I think soft_wqm should not depend on demote to enable wqm. If some graphics operation needs to run under wqm, it should directly use the non-soft version.

softwqm is used by our implementation of subgroup operations: if helper invocations are there, then subgroup operations must interact with them.

Helper invocations exist in one of two cases:

  1. There are "hardwqm" instructions (e.g. sampler).
  2. Invocations are explicitly demoted.

In the latter case, we need to make sure that the demoted invocations are still live for the purpose of feeding their values into those subgroup operations. Simply pretending that hardwqm instructions exists seems like a correct way to ensure that. Perhaps you have an alternative in mind @ruiling?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.softwqm.ll
284–301

Can you pre-submit this test with the incorrect code sequence (ideally, update_llc_test_checks?) and then update this review so that the diff is more clearly visible?

ruiling added a comment.EditedMay 9 2022, 7:56 PM

softwqm is used by our implementation of subgroup operations: if helper invocations are there, then subgroup operations must interact with them.

Helper invocations exist in one of two cases:

  1. There are "hardwqm" instructions (e.g. sampler).
  2. Invocations are explicitly demoted.

There are two possible sources of helper invocations:

  1. A quad may have pixels not covered by current primitive
  2. An active invocation is demoted into helper invocation.

Normally we don't activate helper invocations to optimize for memory bandwidth/power. But if graphics specification require implementation has defined behavior that need helper invocation's participation for certain shader operations, then we have to turn on the helper invocations, we need to place a non-soft wqm. My understanding of the working logic is: it is the observer(image sample, derivative, subgroupQuad operation) to decide whether we should turn on the helper invocations for a quad with at least one active invocation. My understanding of soft_wqm (a kind of weak observer) is let other strong observer to decide whether we should turn on wqm. I guess we are making this change to fix subgroupQuad*** issue, under which I believe we need a non-soft wqm. I did not follow all the details of how demote was implemented, so please correct me if I misread the change. But I still think that if there is no strong observer (hard-wqm), don't enable wqm. If an operation really need to be executed under wqm, then it should use hard-wqm.

critson updated this revision to Diff 428338.May 10 2022, 4:36 AM
  • Rebase on to pre-committed test

I think there is perhaps another way to see this.

From the specification for SPV_EXT_demote_to_helper_invocation:

Demote fragment shader invocation to a helper invocation. [...]
The implementation may terminate helper invocations before the end of the shader as an optimization, but doing so must not affect derivative calculations and does not make control flow non-uniform.

One way to read this is that demote should always create helper lanes, i.e. demote should always run WQM.
Our implementation doesn't do this and instead executes demote as a kill outside WQM regions.
In sense this is actually fixing an issue with the demote implementation, rather than with softwqm.

I agree with the description of softwqm as a "weak observer", but demote is suppose to be a strong source of observable derivatives.

Having gone over the argument with @ruiling separately, I think he has a point. Have you double-checked whether this change is still required after the related LLPC changes have been made?

critson abandoned this revision.May 10 2022, 4:35 PM

This change is not required if front-end uses explicit WQM.