The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch
Chrome size (ThinLTO + instrumented PGO):
base: 371004392
optsize: 366883296
minsize: 342128928