Details
- Reviewers
arsenm foad - Commits
- rG11cd92a70f26: [NFC] Tests for future commit in DAGCombiner
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | CSE changes my optimization. Therefore, I need to do the checking earlier. For example, the following test: define i1 @test1(i32 %arg1, i32 %arg2) #0 { %cmp1 = icmp slt i32 %arg1, 1000 %cmp2 = icmp slt i32 %arg2, 1000 %or = or i1 %cmp1, %cmp2 ret i1 %or } will be optimized as follows with my optimization: bb.0 (%ir-block.0): liveins: $vgpr0, $vgpr1 %1:vgpr_32 = COPY $vgpr1 %0:vgpr_32 = COPY $vgpr0 %2:vgpr_32 = V_MIN_I32_e64 %0, %1, implicit $exec %3:sreg_32 = S_MOV_B32 1000 %4:sreg_32_xm0_xexec = V_CMP_LT_I32_e64 killed %2, killed %3, implicit $exec %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, killed %4, implicit $exec $vgpr0 = COPY %5 SI_RETURN implicit $vgpr0 This is the output after the instruction selection. After CSE, the predicate of the compare instruction changes: ; %bb.0: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) s_waitcnt_vscnt null, 0x0 v_min_i32_e32 v0, v0, v1 s_delay_alu instid0(VALU_DEP_1) v_cmp_gt_i32_e32 vcc_lo, 0x3e8, v0 v_cndmask_b32_e64 v0, 0, 1, vcc_lo s_setpc_b64 s[30:31] |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | I don't understand. I assume you mean MachineCSE? Is your patch not actually a DAG combine as the description states? Can you stop somewhere after SIFixSGPRCopies instead? |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | I am sorry I meant MachineCSE. The patch will upload implements is in DAGCombiner. |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | *The patch that I will upload implements the optimization in DAGCombiner. |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | I am sorry I did not understand your comment earlier :) I update the test. |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | My original point still stands. Why can't you test the end ISA? In general optimization patches are better of testing end to end unless you specifically need to check some intermediate state |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | Lets' say we have the following test: define i1 @test1(i32 %arg1, i32 %arg2) #0 { %cmp1 = icmp slt i32 %arg1, 1000 %cmp2 = icmp slt i32 %arg2, 1000 %or = or i1 %cmp1, %cmp2 ret i1 %or } The dump after SI Fix SGPR copies is: bb.0 (%ir-block.0): liveins: $vgpr0, $vgpr1 %1:vgpr_32 = COPY $vgpr1 %0:vgpr_32 = COPY $vgpr0 %2:vgpr_32 = V_MIN_I32_e64 %0, %1, implicit $exec %3:sreg_32 = S_MOV_B32 1000 %4:sreg_32_xm0_xexec = V_CMP_LT_I32_e64 killed %2, killed %3, implicit $exec %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, killed %4, implicit $exec $vgpr0 = COPY %5 SI_RETURN implicit $vgpr0 The final output is: %bb.0: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) s_waitcnt_vscnt null, 0x0 v_min_i32_e32 v0, v0, v1 s_delay_alu instid0(VALU_DEP_1) v_cmp_gt_i32_e32 vcc_lo, 0x3e8, v0 v_cndmask_b32_e64 v0, 0, 1, vcc_lo s_setpc_b64 s[30:31] So, it is easier to check the correctness of the optimization after SI Fix SGPR copies. If you do not like it, then I can change it. |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | Which flag? |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
2 | -amdgpu-enable-delay-alu=0 |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
19–20 | I added test65. |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
1875 | Also test f16 and v2f16 |
llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll | ||
---|---|---|
1875 | I added tests: test84, test85, test86, test87, test88, test89, tes90 and test91. |
Why use mir for this?