- User Since
- Sep 4 2018, 4:49 AM (116 w, 2 d)
Sun, Nov 22
Wed, Nov 11
Tue, Nov 10
- Use/reuse single virtual register for live mask. This removes need for PHIs and live mask register tracking. Assumes WQM is running non-SSA. (Supporting non-SSA and SSA operation would bloat code.)
- Live mask tracks all kills and demotes.
- Live mask manipulations terminate shader if all lanes are killed, even in non-uniform control flow.
- Move all kill lowering to WQM pass, this simplifies later passes and avoids duplication when updating live mask. Removes the need for "clean up" operations.
- WQM pass always modifies shader if it has any kills or demotes, even if there is no WQM.
Mon, Nov 9
- Add tests, but these depend on D91066
Sun, Nov 8
- Address reviewer comments.
- Invert condition.
- Add tests against a symbol.
Sat, Nov 7
Fri, Nov 6
Thu, Nov 5
Fri, Oct 30
Wed, Oct 28
- Remove restrictions on types of shader where early terminate can occur.
Superseded by 419168d9381959ec6850e9e87aff9d062b68ef4b
Oct 27 2020
Oct 26 2020
Oct 21 2020
Oct 20 2020
- Fix markDefs to iterate all operands of MI
- Remove fix up for SI_ELSE as this is no longer required
- Remove elimination of trivial SGPR to SGPR WWM copies (this adds cruft in atomic optimizer tests)
Thanks, I was going to make the same change, but you beat me to it.
Slight nit. since we use Width - 1 three times, for readability I think we should just declare a new variable for it (TableIdx?).
Oct 19 2020
- Address review comments
Oct 18 2020
- Consistently use Register type.
Oct 17 2020
Oct 16 2020
Oct 15 2020
Oct 14 2020
- Remove peephole
- Pre-commit test
- Use std::array for map array.
Oct 13 2020
- Use std::array and tidy up initialisation.
- Fix number of rows in table.
Oct 12 2020
Having addressed the comments could I get a second quick read before I submit?
Address reviews comments:
- Fix initialiser to use AMDGPU::NoSubRegister and not memset.
- Add comment on mapping array.
Address reviewer comments.
Oct 11 2020
This has been superseded by front-end work in graphics compiler.
To motivate the peephole.
This pattern effects 2% of graphics shaders on GFX9, and nearly 7% on GFX10.
On average we save ~1.5 instructions per effected shader.
On some VulkanCTS tests the savings are much higher.
Given the relatively low gain, I assume it was not worth introducing a new peephole pass, and took this approach to address the duplicate s_mov instructions at the point of generation (when they cheapest to spot).
Oct 10 2020
- Merge code generation loops to avoid needing to generate work list
- Fix potential issue when all elements of copy are overwritten
- Fix test file location error
Oct 8 2020
Oct 7 2020
- Fix assumptions about SCC live intervals which are not valid late in compilation.
Oct 6 2020
- Address comments about pass insertion.
- Fix bug in removal of trivial SGPR copies from WWM.
Oct 5 2020
Oct 3 2020
Add missing variable reset.
Oct 2 2020
Remove more unused code.
Oct 1 2020
Sep 30 2020
Revise the fix to address the underlying problem in the original code.
Sep 24 2020
SI_KILL_CLEANUP pseudos are inserted to mark points where control flow merges and hence the exec mask can be evaluated for early termination of a pixel shader.
These early terminations are added by SIInsertSkips which contains the logic for determining if it is safe to early terminate at a given point.
Sep 23 2020
Sep 22 2020
This passes VulkanCTS as much as stock LLVM does for graphics.
I still need to do some porting work so I can test performance impact.