Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I'd love this to go in, but when I added the hasOneUse() check it was certainly needed. If my old notes serve me well there was a crash in ctlz.ll test and I concluded this check was needed to avoid some shenanigans in the si-fix-sgpr-copies. I need to double check if the issue has been fixed or is just hidden.
I'd love this to go in, but when I added the hasOneUse() check it was certainly needed. If my old notes serve me well there was a crash in ctlz.ll test and I concluded this check was needed to avoid some shenanigans in the si-fix-sgpr-copies. I need to double check if the issue has been fixed or is just hidden.
Since posting the patch I've found one out-of-tree case that fails to compile:
; RUN: llc -march=amdgcn -mcpu=gfx1030 < %s define amdgpu_cs void @main() { bb: %i = load i32, i32 addrspace(3)* null, align 16 br label %bb1 bb1: %i2 = phi i32 [ 0, %bb ], [ %i9, %bb5 ] br label %bb3 bb3: %i4 = icmp eq i32 %i2, 0 br i1 %i4, label %bb5, label %bb3 bb5: %i6 = icmp ult i32 0, %i %i7 = sext i1 %i6 to i32 %i8 = add i32 %i7, 1 %i9 = and i32 %i8, %i7 br label %bb1 }
The DAG just before selection looks like this, with two uses of t23:
t0: ch = EntryToken t3: i32,ch = load<(load (s32) from `i32 addrspace(3)* null`, align 16, addrspace 3)> t0, Constant:i32<0>, undef:i32 t23: i1 = setcc t3, Constant:i32<0>, setne:ch t17: i32,i1 = subcarry Constant:i32<1>, Constant:i32<0>, t23 t25: i32 = select t23, t17, Constant:i32<0> t11: ch = CopyToReg t0, Register:i32 %0, t25 t13: ch = CopyToReg t0, Register:i32 %4, Constant:i32<0> t14: ch = TokenFactor t11, t13
After selection it has introduced a CopyToReg from $scc which I guess we don't support, because it will use SCC_CLASS which is not allocatable:
t0: ch = EntryToken t1: i32 = S_MOV_B32 TargetConstant:i32<0> t29: i32 = V_MOV_B32_e32 TargetConstant:i32<0> t3: i32,ch = DS_READ_B32_gfx9<Mem:(load (s32) from `i32 addrspace(3)* null`, align 16, addrspace 3)> t29, TargetConstant:i16<0>, TargetConstant:i1<0>, t0 t23: i1 = S_CMP_LG_U32 t3, t1 t7: i32 = S_MOV_B32 TargetConstant:i32<1> t17: i32,i1 = S_SUB_CO_PSEUDO t7, t1, t23 t27: ch,glue = CopyToReg t0, Register:i1 $scc, t23 t25: i32 = S_CSELECT_B32 t17, t1, t27:1 t11: ch = CopyToReg t0, Register:i32 %0, t25 t13: ch = CopyToReg t0, Register:i32 %4, t1 t14: ch = TokenFactor t11, t13
Shortly after this it crashes in ScheduleDAGSDNodes::EmitPhysRegCopy with: MachineRegisterInfo.cpp:160: llvm::Register llvm::MachineRegisterInfo::createVirtualRegister(const llvm::TargetRegisterClass *, llvm::StringRef): Assertion 'RegClass->isAllocatable() && "Virtual register RegClass must be allocatable."' failed.
Can you recommit this test
Shortly after this it crashes in ScheduleDAGSDNodes::EmitPhysRegCopy with: MachineRegisterInfo.cpp:160: llvm::Register llvm::MachineRegisterInfo::createVirtualRegister(const llvm::TargetRegisterClass *, llvm::StringRef): Assertion 'RegClass->isAllocatable() && "Virtual register RegClass must be allocatable."' failed.
The should be able to use SReg_32 which we do handle
The should be able to use SReg_32 which we do handle
I can get that to work with a patch like this: https://reviews.llvm.org/differential/diff/426045/
I'm not too happy that I had to change InstrEmitter::EmitCopyFromReg. It makes me wonder if we should be handling uniform compare+select patterns much more like a flags-based CPU does, either by gluing the s_cmp to the s_cselect, or using ISD::SELECT_CC instead of ISD::SELECT in the first place, so that it is all in one DAG node.
Also the codegen is not particularly pretty, but maybe it can be cleaned up by tweaking SIFixSGPRCopies, which has for some reason converted s_cmp to v_cmp but not converted the following s_cselect to v_cndmask:
diff --git a/llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll b/llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll index bff4c0c1533a..6c99d04f6410 100644 --- a/llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll +++ b/llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll @@ -18,7 +18,9 @@ define i32 @f() { ; CHECK-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0 ; CHECK-NEXT: s_cmpk_lg_u32 vcc_lo, 0x0 ; CHECK-NEXT: s_subb_u32 s4, 1, 0 -; CHECK-NEXT: v_cndmask_b32_e64 v0, 0, s4, vcc_lo +; CHECK-NEXT: s_and_b32 s5, vcc_lo, exec_lo +; CHECK-NEXT: s_cselect_b32 s4, s4, 0 +; CHECK-NEXT: v_mov_b32_e32 v0, s4 ; CHECK-NEXT: s_setpc_b64 s[30:31] bb: %i = load i32, i32 addrspace(3)* null, align 16
The biggest headache comes from the fact that during moveToVALU when S_CMP gets converted to V_CMP the users of SCC need to be handled properly otherwise you end up with a weird copy from SCC. I think this is handled right now by adding an extra copy from VCC to SCC to make the connection between V_CMP and S_CSELECT until it is time for the handling of S_CSELECT. This gets more tricky when there are more uses of SCC I guess.
I was thinking we would add an SReg_32_PlusSCC class to use for for conditions (or I guess we could just add SCC directly to SReg_32). GlobalISel directly emits copies between SReg_32 and SCC but I guess InstrEmitter's magic requires an allocatable class
Ping for the new version, which only allows multiple uses if they are all selects. I realise this is not ideal but it is strictly more permissive than the current code, which only allows a single use (which must be a select).
llvm/lib/Target/AMDGPU/SOPInstructions.td | ||
---|---|---|
515 | This is to cope with other uses that have already been selected to CopyToReg, S_CSELECT_B32. I don't know if there's a better way to handle this. |
Looks good to me - the extra s_cselect's generated are worth the complexity arising from this patch.
Can you add a test (or point me to an existing one), where there are multiple uses of scc and some selects, but not all, are used by vector instructions so they would normally be transformed to v_cndmask during moveToValu?
Ping for the new version, which only allows multiple uses if they are all selects.
I think this restriction might not be necessary after D128681.
The hasOneUse check has always been there, since D81925 introduced SelectPat, but I don't see any reason for it.