Proposal: Backward-edge CFI for return statements (RCFI)
Right now the definition of a "function called once" (etc.) would depend on inlining decisions (although with ThinLTO the summary controls which functions are imported, all final inlining decisions are made by individual compilation units). So without changes to this flow you basically would need these steps:
The third step would necessarily be a serialization over all thin backends.
In principle, we could change ThinLTO so that all final inlining decisions are made in the thin link phase. Then we would be able to classify functions at thin link time. But I foresee that as being difficult: the code that decides whether inlining is possible is already quite complex: http://llvm-cs.pcc.me.uk/lib/Analysis/InlineCost.cpp#1466 . That code would need to be re-implemented in the thin link to avoid soundness issues.
We would also need to be careful about changing the number of call sites for a particular function. At least we would need to prevent the thin backend from duplicating a call site, as that could potentially change the calling convention. So we'd need to attach the noduplicate attribute to all call sites for functions deemed to be "called once". If we allow removing call sites, we would need to come up with some scheme to allow the size of the jump table to vary.
To me all signs point to this being better implemented in the linker rather than LTO (or as some sort of postprocessing step over the object files produced by LTO).