This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Handle register spill in branch relaxation
ClosedPublic

Authored by piggynl on Jul 26 2022, 3:24 AM.

Details

Summary

In branch relaxation pass, j's with offset over 1MiB will be relaxed to jump pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than 1MiB. If the register scavenger cannot find a scratch register for jump, spill a register to the slot before the jump and restore it after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Diff Detail

Event Timeline

piggynl created this revision.Jul 26 2022, 3:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 3:24 AM
piggynl requested review of this revision.Jul 26 2022, 3:24 AM
piggynl edited the summary of this revision. (Show Details)Jul 26 2022, 5:14 AM
piggynl added inline comments.Jul 26 2022, 5:19 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1003–1004

Do we need another spill slot to calculate the offset of the slot in eliminateFrameIndex?

craig.topper added inline comments.Jul 26 2022, 9:30 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1008

ARM places constant pools and jump tables in the middle of functions via the ConstantIslands pass . I don't think we do that on RISC-V. So I think we only need to count instruction size.

StephenFan added inline comments.Jul 26 2022, 10:51 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1109

Add this stack object as scavenge frame index in RS. The reason will be explained in the comment below.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1003–1004

if the offset of BranchRelaxationScratchFrameIndex is out of range of Int<12>, then an extra scratch register will be created. To avoid this extra scratch register, we need to make sure that the BranchRelaxationScratchFrameIndex is close to the stack base register. Fortunately, PEI will float or sink the scavenge frame index to close to the stack base register(See https://reviews.llvm.org/D89239). But only if BranchRelaxationScratchFrameIndex is a scavenging frame index.

efriedma added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
999

You probably want to report_fatal_error if you don't have a frame index, so you get an error in release builds if the heuristic breaks somehow.

efriedma added inline comments.Jul 26 2022, 11:30 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1006

You might want to consider using a more conservative estimate for conditional branches, since they can be relaxed to unconditional branches. (I know we don't do this in the ARM function you borrowed this from, but probably we should.)

piggynl updated this revision to Diff 447918.Jul 26 2022, 8:29 PM
  • More conservative estimate for branches in EstimateFunctionSizeInBytes.
  • Remove the size of constants and jump tables in EstimateFunctionSizeInBytes.
  • Add the new stack frame index to RS in RISCVFrameLowering::processFunctionBeforeFrameFinalized.
  • Use report_fatal_error in RISCVInstrInfo::insertIndirectBranch if function size is underestimated.
piggynl updated this revision to Diff 447919.Jul 26 2022, 8:37 PM

Fix code style via clang-format.

StephenFan added inline comments.Jul 26 2022, 9:07 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1113

If it would be better to add a test case that has a large stack and the offset of BranchRelaxationScratchFrameIndex is out of range of Int<12>(if it is not floated or sunk)?

piggynl updated this revision to Diff 447937.Jul 26 2022, 11:20 PM

Add test for a large stack that requires adjusting the offset of BranchRelaxationScratchFrameIndex.

piggynl updated this revision to Diff 447938.Jul 26 2022, 11:23 PM

Fix code style via clang-format.

piggynl updated this revision to Diff 448225.Jul 27 2022, 6:16 PM

Remove unnecessary #include.

StephenFan added inline comments.Jul 27 2022, 10:58 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1108

Actually, scavenging spill slots can be reused. But here we always create a scavenging spill slot even if there is already a scavenging spill slot.

piggynl added inline comments.Jul 28 2022, 12:10 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1108

Since branch relaxation pass is after instruction scheduling, I'm not sure whether the jump could be placed in the middle of a spill from previous pass, and the slot's content could be overwritten if we reuse the slot.

I'm not familiar with instruction scheduling, so please correct me for any mistake I made. Thank you in advance!

There seems to be a problem here: if we perform only some operations on registers without sd after a jump (which can easily happen if only a few short instructions are executed between jumps), we then jump to another place. If we still restore immediately in this case, we are actually wasting resources.

StephenFan added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1108

As far as I know, llvm has not implemented global instruction scheduling, it only schedules instruction within basic blocks. Since Jump instruction is the terminator of a basic block, it may not be scheduled. Otherwise, the control flow graph would be changed. But I'm not an expert in instruction scheduling. Maybe @HsiangKai can answer this question.

hiraditya added inline comments.Jul 28 2022, 8:52 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1107

Can we make an API like isFarBranch or something like it? Other places can reuse it possibly.

piggynl updated this revision to Diff 448517.Jul 28 2022, 11:20 PM

Reuse scavenging spill slot for scratch register's spill slot in branch relaxation.

piggynl added inline comments.Jul 28 2022, 11:22 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1100

Do we need this IsLargeFunction?

There seems to be a problem here: if we perform only some operations on registers without sd after a jump (which can easily happen if only a few short instructions are executed between jumps), we then jump to another place. If we still restore immediately in this case, we are actually wasting resources.

Yes. We can construct a loop with a few instructions executed and a long jump at each iteration, making the jump scratch register saved/restored frequently. Maybe we can add a TODO in BranchRelaxation.cpp for future improvements?

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1107

There is BranchRelaxation::isBlockInRange() in BranchRelaxation pass, but we cannot use that in PEI since there are some late optimizations between PEI and BranchRelaxation (for example MachineBlockPlacement), making us cannot determine whether a branch is a far branch or not. That's also the reason why we reserve a spill slot for BranchRelaxation pass if we estimate the function has a size over 1MiB (which means it is possible that have a branch over 1MiB).

As for an API to estimate the size of a function, it is hard to achieve in a target-independent way. (See https://reviews.llvm.org/D129975) For example, ARM places constant pools and jump tables in the middle of functions via the ConstantIslands pass while RISCV doesn't, meaning that RISCV functions never contain jumps over constant pools and jump tables. As a result, in RISCV, the size of the function does not need to include the constant pool and jump table, whereas in ARM it does.

There seems to be a problem here: if we perform only some operations on registers without sd after a jump (which can easily happen if only a few short instructions are executed between jumps), we then jump to another place. If we still restore immediately in this case, we are actually wasting resources.

Yes. We can construct a loop with a few instructions executed and a long jump at each iteration, making the jump scratch register saved/restored frequently. Maybe we can add a TODO in BranchRelaxation.cpp for future improvements?

Yes, I think we should add TODO.
I have some ideas, but the expected effect is not good. If I can improve the algorithm and implement it, I will open a PR.

Ping :) Is there anything else I can do to improve this?

StephenFan added inline comments.Aug 5 2022, 8:37 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1095–1096

Capitalize i

1100

I think we need.

1100

Is this more readable?

piggynl updated this revision to Diff 450486.Aug 6 2022, 12:36 AM
  • Fix coding style via clang-tidy.
  • Move comments about calculating ScavSlotsNum closer to their codes.
  • Move tests relax_spill in branch-relaxation-spill-{32,64}.ll to relax_jal_spill_{32,64}{,_adjust_spill_slot} in branch-relaxation.ll.
piggynl marked 11 inline comments as done.Aug 6 2022, 12:38 AM
This revision is now accepted and ready to land.Aug 7 2022, 12:16 AM

Thank you! Would you mind taking a look at D129999? This patch depends on it for estimating functions' sizes.

This revision was landed with ongoing or failed builds.Aug 23 2022, 10:31 PM
This revision was automatically updated to reflect the committed changes.

The tests don't seem to have been properly updated with update_llc_test_checks.py. llvm/test/CodeGen/RISCV/branch-relaxation.ll contains RV64 RUN lines but the corresponding CHECK lines are missing in some functions.

The tests don't seem to have been properly updated with update_llc_test_checks.py. llvm/test/CodeGen/RISCV/branch-relaxation.ll contains RV64 RUN lines but the corresponding CHECK lines are missing in some functions.

Looking more closely at this, I guess you tried to only include the CHECK-RV64 and CHECK-RV32 checks when relevant. That's a good instinct but I guess it goes a bit against how we normally use update_llc_test_checks.py. My understanding of the trade-off of using that tool is that the test updates are much easier, even if sometimes the CHECKs aren't as tight as something more tailormade.

The tests don't seem to have been properly updated with update_llc_test_checks.py. llvm/test/CodeGen/RISCV/branch-relaxation.ll contains RV64 RUN lines but the corresponding CHECK lines are missing in some functions.

Looking more closely at this, I guess you tried to only include the CHECK-RV64 and CHECK-RV32 checks when relevant. That's a good instinct but I guess it goes a bit against how we normally use update_llc_test_checks.py. My understanding of the trade-off of using that tool is that the test updates are much easier, even if sometimes the CHECKs aren't as tight as something more tailormade.

You are absolutely right. I've created D132625 to update the tests. Thank you!