Page MenuHomePhabricator

[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic
Needs ReviewPublic

Authored by critson on Sep 19 2019, 7:57 AM.



Add intrinsic which demotes all active lanes to helper lanes.
This is used to implement demote to helper Vulkan extension.

In practice demoting a lane to helper simply means removing it
from the mask of live lanes used for WQM/WWM/Exact mode.
Where the shader does not use WQM, demotes just become kills.

To support demote live lanes must now be tracked through entire
shader. This involves adding PHI nodes during the WQM pass,
which can expose exec mask change issues. This is overcome
by split blocks on changes from WQM/WWM to Exact mode.
As a result the WQM pass no longer preserves CFG, slot indexes
or live intervals as these is no way of maintaining these
when blocks are split.

Additionally add llvm.amdgcn.wqm.helper intrinsic to complement
demote operations. In theory can be used
to detect helper lanes; however, can be moved by LICM.
The movement of cannot be remedied without changing
its type signature and such a change would require
users to update as well.

Diff Detail

Event Timeline

critson created this revision.Sep 19 2019, 7:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2019, 7:57 AM
arsenm requested changes to this revision.Sep 19 2019, 8:53 AM

The WQM pass should be moved to avoid recomputing these. Computing LiveIntervals twice is an unacceptable cost

This revision now requires changes to proceed.Sep 19 2019, 8:53 AM
critson updated this revision to Diff 258672.Apr 20 2020, 1:19 AM

Update to move comprehensive implementation now fairly tested
for graphics.
Now preserves LiveInterval analysis.
This is dependent on D78417.

arsenm added inline comments.Apr 24 2020, 4:08 PM

I assume this needs to be convergent


Ditto, and nomem


Missing DivergenceAnalysis test


use Register

critson marked 3 inline comments as done.Apr 28 2020, 4:57 AM
critson added inline comments.

Could be, my understanding is that without flags the intrinsic is marked "has side effects", which is correct as then it will not moved by LICM or removed by CSE.


Convergent maybe (as above), but not nomem.
If this is marked nomem then it will be eaten by early CSE.
Since this was modelled on kill, is there a reason we don't mark kill Convergent?


I assume you mean the variable name Src -> Register?

critson updated this revision to Diff 260595.Apr 28 2020, 4:58 AM

Add DivergenceAnalysis test.
Mark intrinsics as convergent.
Rename a variable.

nhaehnle added inline comments.Apr 28 2020, 10:32 AM

Rethinking this, convergent isn't correct here, because there is no implied cross-thread communication.

Rather, the semantics are that amdgcn.wqm.helper reads from some hidden memory that is written to by wqm.demote. So by that logic, this should arguably be ReadInaccessibleMemOnly.


Following the logic above, this should not be convergent but WritesInaccessibleMemOnly. At least I *think* that captures the semantics most accurately.

critson updated this revision to Diff 260819.Apr 28 2020, 7:04 PM

Update intrinsic parameters:
llvm.amdgcn.wqm.demote - IntrWriteMem, IntrInaccessibleMemOnly
llvm.amdgcn.wqm.helper - IntrReadMem, IntrInaccessibleMemOnly

Ping - I would like to get this moving again.

Ping - I would like to get this moving again.

Does it make sense to wait until the WQM pass is moved?

Ping - I would like to get this moving again.

Does it make sense to wait until the WQM pass is moved?

Sure, I will revise this in light of D82190 and D82192.

Can you add globalisel tests? I think you may need to add the operand class to the output patterns

critson updated this revision to Diff 285085.Aug 12 2020, 7:33 AM
critson added a subscriber: mbrkusanin.
  • Rebase and update for changes in other code.
  • Rename intrinsic wqm.helper to (it provides live lanes).
  • Add initial GlobalISel support.
  • Add GlobalISel tests (some ported by @mbrkusanin).
foad added a subscriber: foad.Aug 21 2020, 4:37 AM
foad added inline comments.Aug 21 2020, 4:52 AM

I'm pretty sure he meant s/unsigned Src/Register Src/. clang-tidy thinks the same.