It can be profitable to outline single-block cold regions because they
may be large.
Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.
In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.
Don't you think using TTI.getInstructionCost(&I, TargetTransformInfo::TCK_CodeSize) is the right thing to do instead of counting the number of instruction?