Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll | ||
---|---|---|
52 | Should precommit switch to generated checks |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
6134–6136 | MI.readsRegister(SCC)? I also think this would break if we ever bothered to use the feature of directly using scc in instruction operands |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
6134–6136 | MI.readsRegister(SCC) does not fit because I need the exact operand index later on. if (NewCond.isValid()) MI.getOperand(SCCIdx).setReg(NewCond); |
This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.
It also broke Vulkan CTS testing with LLPC on Navi 10, and as well as the failures it caused spurious debug output like:
Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupxor_u16vec4_tess_eval'.. S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc Pass (OK) [...] Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'.. S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc Fail (Failed!)
Heads-up, this change slightly conflicts textually with my change to select s_cselect which I just recommitted following a bug-fix in 0045786f146e78afee49eee053dc29ebc842fee1.
Reverted. But I would appreciate very much if you just in case can share temporay files from the failed run.
and as well as the failures it caused spurious debug output like:
Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'.. S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc Fail (Failed!)
This looks strange to me cause my change does not add any output.
I think perhaps this was related to D81925, and I got confused because it started happening at about the same time that the tests started failing.
Could you please advice on how to reproduce the failure and how to collect the failed shaders assembly?
Is MESA_DEBUG set is enough?
This small piece was missed from the change.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 5f1afdd7f10..7180e0a8d52 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -634,6 +634,9 @@ void SIInstrInfo::copyPhysReg(MachineBasicBlock &MBB, } if (DestReg == AMDGPU::SCC) { + if (AMDGPU::SReg_64RegClass.contains(SrcReg)) { + SrcReg = RI.getSubReg(SrcReg, AMDGPU::sub0); + } assert(AMDGPU::SReg_32RegClass.contains(SrcReg)); BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_LG_U32)) .addReg(SrcReg, getKillRegState(KillSrc))
it fixes Vulkan failure.
You can reproduce it e.g. with
.../piglit/bin/depth-tex-modes-glsl -auto -fbo
Set the environment variable AMD_DEBUG=ps,preoptir,nonir,checkir,mono to get the LLVM IR on stderr (may need to replace ps according to the affected shader type, run with AMD_DEBUG=help for a list of all supported debugging options).
@alex-t are you still planning to work on this? Or has it been (partly or wholly) superseded by
@piotr's rG0045786f146e78afee49eee053dc29ebc842fee1?
Given the check above wave32 should not even get here and shall be handles elsewhere.