This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Deduplicate restore blocks in branch relaxation
Needs ReviewPublic

Authored by piggynl on Aug 10 2022, 9:18 AM.

Details

Summary

For multiple far branches to the same destination, there are chances that some restore blocks could be shared if they clobber the same registers and share the same restore sequence. Since RISCV generates a same restore block consisting only ld s11, 0(sp) for all jumps to one destination (see D130560), deduplicating them would save a few bytes in that case.

This patch stores all restore blocks for a destination in RestoreBlocks in the destination's BasicBlockInfo, allowing the target to call DeduplicateRestoreBB() to find a previously inserted restore block identical to RestoreBB and insert an unconditional branch to it, then BranchRelaxation::fixupUnconditionalBranch() can erase RestoreBB from the function.

Suppose jump .dest is out of range so need a register to jump indirectly in the following code.

.label_0:
        foo     1
        jump    .dest
.label_1:
        foo     2
        jump    .dest
.label_2:
        bar
        bar
        bar
        bar
.dest:
        baz

Without this patch, it will be relaxed to:

.label_0:
        foo     1
        store   reg1, (sp)
        mov     reg1, .restore_1
        jump    reg1
.label_1:
        foo     2
        store   reg1, (sp)
        mov     reg1, .restore_2
        jump    reg1
.label_2:
        bar
        bar
        bar
        bar
.restore_1:
        load    reg1, (sp)
        jump    .dest
.restore_2:
        load    reg1, (sp)
.dest:
        baz

With this patch, restore blocks can be deduplicated, thus generating the following code:

.label_0:
        foo
        store   reg1, (sp)
        mov     reg1, .restore_1
        jump    reg1
.label_1:
        foo
        store   reg1, (sp)
        mov     reg1, .restore_1
        jump    reg1
.label_2:
        bar
        bar
        bar
        bar
.restore_1:
        load    reg1, (sp)
.dest:
        baz

Depends on D131865, which pre-commits the modified test.

Diff Detail

Event Timeline

piggynl created this revision.Aug 10 2022, 9:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 9:18 AM
piggynl requested review of this revision.Aug 10 2022, 9:18 AM
piggynl edited the summary of this revision. (Show Details)Aug 10 2022, 10:10 AM
piggynl added inline comments.Aug 10 2022, 10:10 AM
llvm/lib/CodeGen/BranchRelaxation.cpp
119

It is possible to use SmallSet instead of SmallVector for RestoreBlocks. It will enable this function to find the reusable block in O(log N) time, but It will take a lot of work. We have to implement a comparator for MachineBasicBlock like FunctionComparator. Also, we cannot reuse MachineInstr::isIdenticalTo() and MachineOperand::isIdenticalTo() because they can only return the equality.

I have no idea how frequently is register spill required for branch relaxation in AMDGPU, but in RISCV, it almost never happens. Also, since RISCV generates a same restore block consisting only ld s11, 0(sp) for all jumps to one destination (see D130560), there should be no difference for RISCV to use SmallVector or SmallSet.

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
2036

I have no knowledge of AMDGPU and don't know how to construct some branches that can share or cann't share a same restore block, so please kindly let me know how to construct the test properly.

This is the first added function. I copied @spill() in this file and simply added another branch after the previous one. The two branches look to share the same restore block .LBB2_5.

2998

This is the second added function. I copied @spill_func() in this file and simply added another branch after the previous one. The two branches look don't have a same restore block.

3405–3415

and I don't know why this sequence is forming. Is this intended?

piggynl updated this revision to Diff 451626.Aug 10 2022, 2:08 PM
  • Add test for RISCV.
craig.topper added inline comments.Aug 12 2022, 11:39 AM
llvm/lib/CodeGen/BranchRelaxation.cpp
66

How often are there RestoreBlocks? I wonder if this makes sense to use TinyPtrVector so that we don't enlarge the BasicBlockInfo as much. This SmallVector is probably at least 24 bytes?

craig.topper added inline comments.Aug 12 2022, 11:43 AM
llvm/test/CodeGen/RISCV/branch-relaxation.ll
1299 ↗(On Diff #451626)

Pre-commit the test?

piggynl added inline comments.Aug 14 2022, 10:38 AM
llvm/test/CodeGen/RISCV/branch-relaxation.ll
1433 ↗(On Diff #451626)

I think this %dest_2 should be %dest_1. Creating D131863 to fix this. I'll pre-commit this @relax_jal_spill_32_deduplicate_restore_block() test based on D131863.

piggynl updated this revision to Diff 452545.Aug 14 2022, 11:26 AM
piggynl edited the summary of this revision. (Show Details)
  • Change the type of BasicBlockInfo::RestoreBlocks from SmallVector to TinyPtrVector.
  • Rebase onto pre-committed test D131865.
piggynl marked 2 inline comments as done.Aug 14 2022, 11:29 AM
piggynl added inline comments.
llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
2036

Moved these comments about tests for AMDGPU to D131865.

Ping for review :)

Ping for review :)

When would an inserted restore block ever not be identical with the same destination?

llvm/lib/CodeGen/BranchRelaxation.cpp
92

This loop seems overcomplicated for a comparison. Can you just do a range loop, comparing identical instructions until the first terminator?

96

s/has/have/

103

Does this also need to consider the various flags on the block (e.g. alignment)?

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
984

Why is this a callback? Why doesn't the pass just deduplicate to begin with and pass in the new block?

piggynl planned changes to this revision.Nov 22 2022, 11:08 PM
piggynl updated this revision to Diff 493855.Feb 1 2023, 12:55 AM
piggynl marked 2 inline comments as done.
  • Rebase
  • Fix typo
  • Simplify the loop for the comparison in BasicBlockInfo::isRestoreBlockIdentical()
piggynl marked an inline comment as not done.Feb 1 2023, 12:56 AM

Currently SIInstrinfo::insertIndirectBranch do something like this:

  1. Create a new virtual register PCReg and two symbols OffsetLo and OffsetHi.
  1. Before the branch instruction, insert instructions to calculate the branch offset with PCReg, OffsetLo, and OffsetHi.
  1. Scavenge a free SGPR, or spill an SGPR to VGPR to make it free.
  1. Replace PCReg with the free SGPR.
  1. Set the values of OffsetLo and OffsetHi.

Only when the spill happens, a restore block will be used and should be deduplicated.

When would an inserted restore block ever not be identical with the same destination?

The restore block will not be identical when the used SGPR or VGPR is different.

llvm/lib/CodeGen/BranchRelaxation.cpp
103

No target is constructing the restore block with flagged instructions at this time. I've added a FIXME here.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
984

In TargetInstrinfo::insertIndirectBranch(), targets first optionally build the restore block, then build the indirect branch points to the restore block. The deduplication is only possible in the middle of the two processes. Both two processes may be heavy (for example SIInstrinfo::insertIndirectBranch()). I think splitting insertIndirectBranch() into two functions is not a good idea because it will increase the complexity for targets doesn't need any restore block.

piggynl updated this revision to Diff 493973.Feb 1 2023, 8:39 AM

Update code and tests for LoongArch.

luke957 removed a subscriber: luke957.Feb 2 2023, 11:30 PM

Ping for review :)

foad added inline comments.Feb 28 2023, 2:23 AM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
600–605

The new argument needs documentation.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2432

Why is this removed?

piggynl updated this revision to Diff 501746.Mar 1 2023, 10:43 PM

Add a comment about parameter DeduplicateRestoreBB of function insertIndirectBranch.

piggynl marked 2 inline comments as done.Mar 1 2023, 10:43 PM
piggynl added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2432

Because D106449 has already addressed this case.

foad added inline comments.Mar 2 2023, 3:21 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2432

Then please commit it separately.

piggynl updated this revision to Diff 501896.Mar 2 2023, 9:45 AM
piggynl marked an inline comment as done.

Split irrelevant comments change to 8fccdfa.

piggynl marked an inline comment as done.Mar 2 2023, 9:46 AM

Hi, is there anything else I can improve?

qcolombet added inline comments.Mar 15 2023, 1:34 AM
llvm/lib/CodeGen/BranchRelaxation.cpp
103

Could we bail out on that or at least assert?

dhoekwater added inline comments.Aug 31 2023, 5:30 PM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
600–605

Is it necessary to modify the interface of this function in this way? Since you're delegating the "is the new restore block identical to any old ones?" logic to the caller anyway, why not leave the caller responsible for deduplicating blocks?

If you were to add a new virtual void retargetIndirectBranch(MachineBasicBlock &MBB, MachineBasicBlock &NewDestBB) hook to TargetInstrInfo, you could keep the deduplication logic one level up by doing something like:

insertIndirectBranch(...);
if (!restoreBlock->empty() && isDuplicate(restoreBlock))
  retargetIndirectBranch(MBB, duplicateRestoreBlock);