This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] moving vcc branch optimization into peephole
AbandonedPublic

Authored by cdevadas on Mar 3 2020, 12:11 AM.

Details

Reviewers
arsenm
rampitec
Summary

This optimization is presently included in SIInsertSkips pass.
SIInsertSkips will soon go away. Before that, Moving this
specific optimization into an appropriate place.

Diff Detail

Event Timeline

cdevadas created this revision.Mar 3 2020, 12:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 12:11 AM
arsenm added inline comments.Mar 3 2020, 11:26 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

Should not need an extra run of this

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182

Register, and initialization isn't needed

cdevadas marked 2 inline comments as done.Mar 3 2020, 7:28 PM
cdevadas added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

The peephole is invoked earlier during SSAOptimization. It is required here to optimize the pattern introduced later. The lit test multilevel-break.ll has a similar opportunity in function multi_if_break_loop.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182

Are you saying the initialization is not required?
SReg is not defined in all control-flows later.

arsenm added inline comments.Mar 6 2020, 10:43 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

Where is the pattern introduced? Does this ever trigger in the initial run?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182

If you use Register instead of unsigned, it default initializes to NoRegister/0

cdevadas marked 2 inline comments as done.Mar 7 2020, 9:02 AM
cdevadas added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

The pattern got introduced with Basic Block Placement (See below, BB.2 & bb.5 are combined into BB.2)
The extra run is required to optimize it.

IR Dump before BB Placement:
multi_if_break_loop:
bb.2:

successors: %bb.5
liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5
renamable $sgpr8_sgpr9 = S_MOV_B64 0
renamable $sgpr6_sgpr7 = S_MOV_B64 -1
renamable $sgpr10_sgpr11 = S_MOV_B64 -1
S_BRANCH %bb.5

bb.3:
// Insns
bb.5:

successors: %bb.6, %bb.8
renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc
S_CBRANCH_VCCZ %bb.8, implicit $vcc

IR Dump after BB Placement:
multi_if_break_loop:

bb.2:

successors: %bb.6, %bb.8
liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5
renamable $sgpr8_sgpr9 = S_MOV_B64 0
renamable $sgpr6_sgpr7 = S_MOV_B64 -1
renamable $sgpr10_sgpr11 = S_MOV_B64 -1
renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc
S_CBRANCH_VCCZ %bb.8, implicit $vcc
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182

Sure, will do that.

cdevadas updated this revision to Diff 249677.Mar 11 2020, 10:44 AM

incorporated the suggestion.

If this can't work in SSA, then it shouldn't be done in PeepholeOptimizer.

I'm also noticing a few defects in the existing handling. If I disable the optimization in test/CodeGen/AMDGPU/multilevel-break.ll, the dead and is actually left behind. I'm assuming this is because if the check for dead SCC, but this should be using LivePhysRegs to make sure SCC is not live out

arsenm added inline comments.Mar 19 2020, 2:41 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

Still has the extra pass run

cdevadas marked an inline comment as done.Mar 20 2020, 3:55 AM
cdevadas added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017

Planning to introduce a late pass called 'SIPreEmitPeephole' to handle it. In general, this pass can handle any late optimization opportunities identified before code emission.

cdevadas abandoned this revision.Mar 24 2020, 9:22 AM

Abandoning this review.
This optimization should be handled late after Basic Block Placement. A new review will be opened by handling it in a late pass.