Details
Diff Detail
Event Timeline
Is WQM needed for correctness or is it just an optimization?
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
514–515 | Should set DebugLoc | |
519–520 | These can be combined with findRegisterUseOperandIdx | |
533–534 | Should set DebugLoc | |
560 | This function is kind of complicated to special case track SCC liveness. Can you use LivePhysRegs or check the LiveIntervals to simplify it (I think there was some issue with tracking physical registers with LIS last time I tried to do it, but I don't remember) | |
569 | Capitalize | |
575–578 | The Clean = true can be hoisted to the initializer | |
581–582 | range loop | |
583 | Maybe this should get the read operand and see if it is undef? | |
587–588 | Should check if it's dead? | |
606 | I think these should use MI's DebugLoc | |
614–615 | Insert through LIS | |
618 | This can just return Restore | |
test/CodeGen/AMDGPU/wqm.ll | ||
469 | I don't see check lines for the cmp + select restore pattern here. Where is the scc def? This test also probably needs a comment | |
475 | CHECK-NEXT for s_cbranch_scc? |
Can you post an MI dump example of when this happens? It seems like this isn't something you should need to handle
Rewrite in terms of LiveIntervals. (LivePhysRegs complains about remaining
virtual registers.)
Not sure about those DebugLocs. I'm not very familiar with their semantics,
but there isn't usually one particular instruction that can be pinned down
as the reason for inserting a new instruction (for example, when a new
instruction is added at the end of a basic block).
WQM is needed for correctness of pixel shaders. It must be enabled for all calculations that feed into derivative calculations (explicit ones via DS and implicit ones via image sample instructions), and it must be disabled for all stores.
test/CodeGen/AMDGPU/wqm.ll | ||
---|---|---|
469 | That's because I didn't actually manage to concoct a test where the pattern is unavoidable. Usually, the SCC def ends up right before the SCC use, so that the WQM instruction can just be moved around it. I've tested the pattern with some artificial hacks in the code (in the latest version by setting First == Last in prepareInsertion). The problem is that I cannot prove that it will never be needed. For example, it might be needed if the machine scheduler makes unusual decisions about moving store instructions between the SCC def and the SCC use. |
The added test case in an example (I actually encountered this out in the real world in a piglit test.)
As you can see from the excerpt at http://paste.ubuntu.com/19764805/, the most natural place to insert the s_wqm_b64 would be between the s_cmp_lt and the s_cbranch. What the new code does instead is to maintain a range of places where the s_wqm_b64 could be inserted (based on the correctness constraints for WQM), and then finds a place in that range where SCC isnt' live.
include/llvm/CodeGen/LiveIntervalAnalysis.h | ||
---|---|---|
395–400 ↗ | (On Diff #64855) | This should probably be a separate patch |
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
445–449 | I think this should use -1 instead of 1, and use compare != 0 to be more canonical | |
445–449 | Maybe you should just emit COPY and move this expansion to copyPhysReg for scc |
It turns out that r280589 introduced a piglit regression that needs this
change to be applied already now rather than as part of later WQM rework.
I had originally planned this after D22195, but with the recent changes to
control flow lowering that other patch is currently not ready to land / be
adapted, so I adjusted this SCC fix patch so that it applies on current SVN
trunk.
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
424 | MI.isTransient() might be more accurate. INLINEASM for example will not be removed |
MI.isTransient() might be more accurate. INLINEASM for example will not be removed