Revise the fix to address the underlying problem in the original code.
Thu, Sep 24
SI_KILL_CLEANUP pseudos are inserted to mark points where control flow merges and hence the exec mask can be evaluated for early termination of a pixel shader.
These early terminations are added by SIInsertSkips which contains the logic for determining if it is safe to early terminate at a given point.
Wed, Sep 23
Tue, Sep 22
This passes VulkanCTS as much as stock LLVM does for graphics.
I still need to do some porting work so I can test performance impact.
Thu, Sep 17
Add basic test.
Wed, Sep 16
Are there any other significant changes? I'm thinking of things like this that would help with dependency stalls on gfx10:v_mov v0, 0 v_mov v1, v0
->v_mov v0, 0 v_mov v1, 0
I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.
Does anyone have an opinion on whether I should continue pushing this?
Honestly it does not sound like a good justification for a new pass.
Agreed, but maybe it's worth revisiting after the WQM pass has been moved.
Follow up verifier fix for issue noted by Stas in D87748.
Tue, Sep 15
The numbers for this change are not vastly compelling.
I looked at 11598 game shaders and compiled these for GFX7, GFX9 and GFX10.
On GFX7, 1 shader lost 1 instruction.
On GFX9, 1 shader lost 1 instruction, but 64 shaders gained 1 instruction.
On GFX10, 1 shader lost 1 instruction, but 2 shaders gained 1 instruction.
Mon, Sep 14
Sun, Sep 13
Aug 21 2020
Aug 19 2020
Aug 13 2020
Aug 12 2020
- Rebase on to pre-committed test.
- Rebase and update for changes in other code.
- Rename intrinsic wqm.helper to wqm.live (it provides live lanes).
- Add initial GlobalISel support.
- Add GlobalISel tests (some ported by @mbrkusanin).
Aug 8 2020
I tried this change with game traces on GFX10.
Aug 5 2020
Aug 3 2020
Jul 31 2020
- Exclude sub-registers that cover all banks from stall calculations.
- Fix further edge cases involved with sub-registers.
- Add additional tests to cover bugs fixed.
Jul 30 2020
Correctly handle determining banks for wide sub-registers.
Jul 29 2020
Jul 16 2020
Add missing test diffs.
Jul 15 2020
Jul 14 2020
Change assertion to llvm_unreachable.
Rebase on top of test pre-commit.
Add assertions relating to MaskValue.
Jul 13 2020
Hard code SCC operand number.
Jul 12 2020
Jul 11 2020
Jul 6 2020
Technically this is NFC with current hardware configurations, but I would like to be able to decouple this from MAD/MAC.
Jul 2 2020
Jun 30 2020
- Add comment
- Rename generateEndPgm
- Edit comment for clarity
- Add explicit test to skip-if-dead.ll
Jun 29 2020
Can you explain the reasoning behind how you decide where to insert SI_KILL_CLEANUP?
I appreciate there might be some resistance to this way of doing things, i.e. relying on control flow lowering to insert cleanup markers. As an alternative I can rewrite this to look directly at the exec mask processing during the insert skips pass.
Jun 28 2020
Rework this on top of D77544.
Make insertion of skips near the beginning of blocks work correctly by splitting blocks.
Jun 27 2020
We are trying to have the kill intrinsic handling early during the instruction selection.
I can try to incorporate it all entirely there including the unify block code (if can't find a better place now to have it).
I will also update D77544 if this patch goes upstream.
Jun 26 2020
This is a stepping stop to adding early termination in more places because it means termination can be added anywhere in a basic block (without splitting basic blocks).
Jun 23 2020
Jun 19 2020
Address comments on foldMemoryOperandImpl.
Ping - I would like to get this moving again.
- Add test for foldMemoryOperandImpl
- Rework code in foldMemoryOperandImpl to remove references to specific physical registers
Jun 18 2020
Jun 17 2020
This is the simplified version which blocks spill/reload of exec from occuring.
Simplify test changes.
Fix error in test update.
Can you try to constrain the class of the virtual register earlier to avoid this from happening instead? I think that should avoid this coming up (e.g. we do MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass) already to try avoiding m0 spills, plus in foldMemoryOperandImpl to avoid another problem when spills are optimized out)
Update spill-special-sgpr.mir for auto updated version.