Current tail duplication in machine block placement pass uses block frequency information in cost model. But frequency number has only relative meaning compared to other basic blocks in the same function. A large frequency number doesn't mean it is hot and a small frequency number doesn't mean it is cold.
To overcome this problem, this patch uses profile count in cost model if it's available. So we can tail duplicate real hot basic blocks.
When tested with spec2006int, the performance doesn't change, the number of tail duplicated blocks was reduced from 2376 to 1746.
In our internal testing, search1 was not impacted, search2 was improved by 0.1%, another 0.1% can be achieved with larger threshold parameter.
Nit: TailDupProfilePercentThreshold