This is three separate things, which when applied together come to much the same effect as D52716. They are:
- Reduce the call penalty at minsize to 15. In my testing on Arm and X86, this seemed to be the best spot for reducing codesize. It is still not 0 because of inaccuracies in the inliner cost modelling, but over time can be gradually decreased as things gets better. (I did not change the Threshold as it seemed sensible to also decrease the cost of sub-calls at minsize).
- If a block has more than one unconditional predecessor, mark each one past the first as non-free. This models the extra branch costs for loops (but can cause problems for cases where the blocks are not in the form they will appear in assembly).
- Geps that are used by phis are not free. This is attempting to capture the geps in loop IVs. (I'm not sure if there is a better way to capture that.)
All together they make us much less likely to inline small loops at minsize. The patch is D52716 is still better for total codesize in my testing, but this attempts to model things more precisely.