Currently, -flto-unit is specified whenever LTO options are used
(unless using the old LTO API). This causes vtable defs to be processed
using regular LTO, which is needed for CFI and whole program vtable
optimizations, since they need to modify the vtables in a whole program
manner.
However, this causes non-negligible overhead due to the regular
LTO processing. Since this isn't needed when not using CFI or
-fwhole-program-vtables, only enable -flto-unit in those cases.
Otherwise all ThinLTO compiles pay the overhead, even when not needed.
It's a little confusing to talk about "LTO units" as a property of a translation unit when there is only one LTO unit per linkage unit. I think this should say that an LTO unit is the subset of the linkage unit compiled with certain flags. Then in the rest of the document you can talk about translation units that are either part of or not part of the LTO unit.