Using AvgLoopIters on any loop is too imprecise making the cost model favor users inside loop nests regardless of the actual tripcount.
Compile times -O3
| benchmark | nspecs before | nspecs after | instrCnt delta % | 
| ClamAV | 5 | 5 | +0.006 | 
| 7zip | -0.031 | ||
| tramp3d-v4 | -0.043 | ||
| kimwitu++ | -0.156 | ||
| sqlite3 | 2 | -0.571 | |
| mafft | -0.029 | ||
| lencod | +0.029 | ||
| SPASS | 2 | 2 | -0.038 | 
| consumer-typeset | -0.045 | ||
| Bullet | 1 | +0.055 | |
| geomean | -0.083 | ||
Compile times LTO
| benchmark | nspecs before | nspecs after | instrCnt delta % | 
| ClamAV | 1 | -0.159 | |
| 7zip | -0.023 | ||
| tramp3d-v4 | -0.018 | ||
| kimwitu++ | +0.016 | ||
| sqlite3 | 2 | 1 | +0.357 | 
| mafft | +0.029 | ||
| lencod | 0 | ||
| SPASS | 1 | -0.283 | |
| consumer-typeset | 1 | +0.539 | |
| Bullet | +0.013 | ||
| geomean | +0.047 | 
(Hi! I notice this and was trying to get some generated function specialized so would like to share some thoughts :) )
Function spec cost is calculated as Metrics.NumInsts * InlineConstants::getInstrCost() (i.e., without TTI per instruction cost).
I wonder if it would make more sense to make per-instruction cost calculation consistent.
Also it seems Weight could give a big bonus for functions in a hot inner-loop with PGO, wonder how it affects code size in PGO case.