Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.
To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:
- make sure there is at least one duplication in current work set.
- the number of duplication should not exceed the number of successors.
The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.
Can the code be restructured here so that it picks and returns the top two most profittable predecessors and perform tail dup into them?