This optimization is presently included in SIInsertSkips pass.
SIInsertSkips will soon go away. Before that, Moving this
specific optimization into an appropriate place.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
1017 | The peephole is invoked earlier during SSAOptimization. It is required here to optimize the pattern introduced later. The lit test multilevel-break.ll has a similar opportunity in function multi_if_break_loop. | |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
2182 | Are you saying the initialization is not required? |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
1017 | The pattern got introduced with Basic Block Placement (See below, BB.2 & bb.5 are combined into BB.2) IR Dump before BB Placement: successors: %bb.5 liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5 renamable $sgpr8_sgpr9 = S_MOV_B64 0 renamable $sgpr6_sgpr7 = S_MOV_B64 -1 renamable $sgpr10_sgpr11 = S_MOV_B64 -1 S_BRANCH %bb.5 bb.3: successors: %bb.6, %bb.8 renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc S_CBRANCH_VCCZ %bb.8, implicit $vcc IR Dump after BB Placement: bb.2: successors: %bb.6, %bb.8 liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5 renamable $sgpr8_sgpr9 = S_MOV_B64 0 renamable $sgpr6_sgpr7 = S_MOV_B64 -1 renamable $sgpr10_sgpr11 = S_MOV_B64 -1 renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc S_CBRANCH_VCCZ %bb.8, implicit $vcc | |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
2182 | Sure, will do that. |
If this can't work in SSA, then it shouldn't be done in PeepholeOptimizer.
I'm also noticing a few defects in the existing handling. If I disable the optimization in test/CodeGen/AMDGPU/multilevel-break.ll, the dead and is actually left behind. I'm assuming this is because if the check for dead SCC, but this should be using LivePhysRegs to make sure SCC is not live out
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
1017 | Still has the extra pass run |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
1017 | Planning to introduce a late pass called 'SIPreEmitPeephole' to handle it. In general, this pass can handle any late optimization opportunities identified before code emission. |
Abandoning this review.
This optimization should be handled late after Basic Block Placement. A new review will be opened by handling it in a late pass.
Should not need an extra run of this