Page MenuHomePhabricator

[AMDGPU] Enable compare operations to be selected by divergence
ClosedPublic

Authored by alex-t on Jun 19 2020, 7:57 AM.

Details

Summary

Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.

Diff Detail

Event Timeline

alex-t created this revision.Jun 19 2020, 7:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2020, 7:57 AM
rampitec added inline comments.Jun 19 2020, 9:26 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
608

Given the check above wave32 should not even get here and shall be handles elsewhere.

5266

RI.getVCC()

alex-t updated this revision to Diff 272254.Jun 20 2020, 7:24 AM

Code changed according to the reviewer request

alex-t marked 2 inline comments as done.Jun 20 2020, 7:29 AM
rampitec added inline comments.Jun 22 2020, 11:24 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5273–5274

Move brake inside the brace to fix the formatting.

5401

Also move continue inside the brace.

5836–5837

Reformat this.

arsenm added inline comments.Jun 22 2020, 12:47 PM
llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll
52

Should precommit switch to generated checks

alex-t updated this revision to Diff 272643.Jun 23 2020, 1:53 AM

Formatting fixed. test extract_vector_dynelt.ll changed.

alex-t marked 4 inline comments as done.Jun 23 2020, 2:18 AM
alex-t updated this revision to Diff 272689.Jun 23 2020, 5:31 AM

udivrem.ll checks updated

arsenm added inline comments.Jun 23 2020, 5:40 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6134–6136

MI.readsRegister(SCC)? I also think this would break if we ever bothered to use the feature of directly using scc in instruction operands

This revision is now accepted and ready to land.Jun 23 2020, 11:53 AM
alex-t marked an inline comment as done.Jun 24 2020, 12:36 AM
alex-t added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6134–6136

MI.readsRegister(SCC) does not fit because I need the exact operand index later on.

if (NewCond.isValid())
    MI.getOperand(SCCIdx).setReg(NewCond);
This revision was automatically updated to reflect the committed changes.

This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.

foad added a subscriber: foad.Jun 25 2020, 2:40 AM

This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.

It also broke Vulkan CTS testing with LLPC on Navi 10, and as well as the failures it caused spurious debug output like:

Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupxor_u16vec4_tess_eval'..
S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc
S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc
  Pass (OK)

[...]

Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'..
S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc
S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc
  Fail (Failed!)
piotr added a subscriber: piotr.Jun 25 2020, 2:50 AM

Heads-up, this change slightly conflicts textually with my change to select s_cselect which I just recommitted following a bug-fix in 0045786f146e78afee49eee053dc29ebc842fee1.

This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.

Reverted. But I would appreciate very much if you just in case can share temporay files from the failed run.

and as well as the failures it caused spurious debug output like:

Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'..
S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc
S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc
  Fail (Failed!)

This looks strange to me cause my change does not add any output.

foad added a comment.Jun 26 2020, 1:12 AM

and as well as the failures it caused spurious debug output like:

Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'..
S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc
S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc
  Fail (Failed!)

This looks strange to me cause my change does not add any output.

I think perhaps this was related to D81925, and I got confused because it started happening at about the same time that the tests started failing.

This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.

Could you please advice on how to reproduce the failure and how to collect the failed shaders assembly?
Is MESA_DEBUG set is enough?

This small piece was missed from the change.

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 5f1afdd7f10..7180e0a8d52 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -634,6 +634,9 @@ void SIInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
   }

   if (DestReg == AMDGPU::SCC) {
+    if (AMDGPU::SReg_64RegClass.contains(SrcReg)) {
+      SrcReg = RI.getSubReg(SrcReg, AMDGPU::sub0);
+    }
     assert(AMDGPU::SReg_32RegClass.contains(SrcReg));
     BuildMI(MBB, MI, DL, get(AMDGPU::S_CMP_LG_U32))
       .addReg(SrcReg, getKillRegState(KillSrc))

it fixes Vulkan failure.

Could you please advice on how to reproduce the failure and how to collect the failed shaders assembly?
Is MESA_DEBUG set is enough?

You can reproduce it e.g. with

.../piglit/bin/depth-tex-modes-glsl -auto -fbo

Set the environment variable AMD_DEBUG=ps,preoptir,nonir,checkir,mono to get the LLVM IR on stderr (may need to replace ps according to the affected shader type, run with AMD_DEBUG=help for a list of all supported debugging options).

foad added a comment.EditedOct 27 2020, 6:47 AM

@alex-t are you still planning to work on this? Or has it been (partly or wholly) superseded by
@piotr's rG0045786f146e78afee49eee053dc29ebc842fee1?

This comment was removed by alex-t.

@alex-t are you still planning to work on this? Or has it been (partly or wholly) superseded by
@piotr's rG0045786f146e78afee49eee053dc29ebc842fee1?

I still hope to get back to this.