The small size is 8 which is the worst case of the core recursive
algorithm.
The special cases use the core algorithm and append additonal
instructions. We were pushing the extra instructions before checking
the profitability. This could lead to 9 and maybe 10 instructions
in the sequence which overflows the small size.
This patch does the profitability check before inserting the
extra instructions so that we don't create 9 or 10 insruction
sequences.
Alternative we could bump the small size to 9 or 10, but then
we're pushing things that are never going be used.