CodeGen: BlockPlacement: Increase tail duplication size for O3.
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon
2% slowdown in lpbench. Seems related, but couldn't completely diagnose.
Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization