This is an alternative to D87479 (where the proposed change was to InstCombine).
InstCombine sinks code without regard to cost/loops, so we don't want that to be near the final step in the opt pipeline. In the example test, an expensive fdiv gets hoisted out of a loop.
I don't have any guess about the impact this has on compile-time, or if we can position the extra LICM somewhere else to make it better/cheaper. The regular (non-LTO) pipeline already has a late LICM, so we don't have this problem with plain -O? compiles.
Can you try something like this, ...