- Add amdgcn_strict_wqm intrinsic.
- Add a corresponding STRICT_WQM machine instruction.
- The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
- The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.
Details
- Reviewers
arsenm critson - Commits
- rG4672bac1776e: [AMDGPU] Introduce Strict WQM mode
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
14 | I am not sure if this is really the clearest description of strict mode and control flow. | |
206–207 | I wonder if these should now just be called toStrictMode and fromStrictMode. | |
llvm/test/CodeGen/AMDGPU/wqm.ll | ||
433 | It's not clear to me how this test proves the "coalesced" part? | |
500 | Might be worth clarify that this exit is in the %if branch. | |
535 | Same comment as previous test. |
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
14 | Thanks, I will incorporate your description in the code. | |
206–207 | Agreed, that would be clearer - I will rename it. | |
llvm/test/CodeGen/AMDGPU/wqm.ll | ||
433 | I patterned this test on the existing one for WWM (test_strictwwm4), which follows the same structure. It seems the comment is wrong. |
llvm/test/CodeGen/AMDGPU/wqm.ll | ||
---|---|---|
433 | I looked closely at the original test and I believe the check for "endif" was missing. These two cases make sure that a WWM feeding a phi node is handled properly and the mov/add happens in the same block as WWM computations. |
clang-format suggested style edits found: