This is an archive of the discontinued LLVM Phabricator instance.

Only split cold blocks with more than a given number of instructions
AbandonedPublic

Authored by dhoekwater on Mar 2 2023, 8:23 PM.

Details

Reviewers
snehasish
Summary

On Arm, splitting a cold block may incur a thunk, a 16-byte snippet of code
that extends the range of an unconditional branch. Consequently, splitting
the block may actually inflate cold code size and hurt performance.

While thunk-aware splitting is a complex problem, only splitting cold blocks
larger than a thunk will get some wins without the risk of regression.

Diff Detail

Event Timeline

dhoekwater created this revision.Mar 2 2023, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 8:23 PM
dhoekwater retitled this revision from Only split cold blocks with more than a given number of instructions. to Only split cold blocks with more than a given number of instructions.Mar 2 2023, 11:18 PM
dhoekwater published this revision for review.Mar 6 2023, 2:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 2:28 PM

Adding Snehasish as a reviewer as git blame suggests snehasish has designed and written most of the MachineFunctionSplitter.

It seems to me that this patch would also need:

  • Some regression test to test it actually changes code generation (only?) when targeting Arm.
  • As is, this patch will not result in changed code generation IIUC?

It seems to me that this patch would also need:

  • Some regression test to test it actually changes code generation (only?) when targeting Arm.
  • As is, this patch will not result in changed code generation IIUC?

Yes, a small test for this would be great. The existing tests live in llvm-project/llvm/test/CodeGen/X86/machine-function-splitter.ll. I don't think this enhancement has much utility on X86, so perhaps the tests should be under CodeGen/AArch64 instead?

dhoekwater abandoned this revision.Jun 22 2023, 10:17 AM

Dropping for the time being as it doesn't look like thresholding is necessary for MFS performance.