This uses inline-cold-callsite-threshold to callsites that are cold locally (within the function) in the absence of profile information. Callsite's coldness is determined based on it's BFI relative to the caller's entry BFI. The default value chosen is callsite's frequency being <= 1/100th of the caller's entry frequency. In general this is a small size win. For example, the llvm test-suite sees a mean text size reduction of 0.03%, but there are some nice wins in large benchmarks there (sqlite sees 23% reduction and 7zip sees 4% reduction). There are some regressions, but those are all on benchmarks with smaller code size (<4K). This also improves an internal compression benchmark by around 8% by preventing a cold callee from being inlined and thereby allowing the caller to be inlined into its caller. The 1% threshold allows a callsite guarded by _builtin_expect(cond, 0) inside a singly-nested loop to be considered cold.
I am working on doing similar identification of hot callsites without profile information, but tuning that is proving to be harder and so I want to start with this first.