This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Enable LTOUnit only when it is needed
AbandonedPublic

Authored by tejohnson on Oct 22 2018, 1:09 PM.

Details

Reviewers
pcc
Summary

Currently, -flto-unit is specified whenever LTO options are used
(unless using the old LTO API). This causes vtable defs to be processed
using regular LTO, which is needed for CFI and whole program vtable
optimizations, since they need to modify the vtables in a whole program
manner.

However, this causes non-negligible overhead due to the regular
LTO processing. Since this isn't needed when not using CFI or
-fwhole-program-vtables, only enable -flto-unit in those cases.
Otherwise all ThinLTO compiles pay the overhead, even when not needed.

Diff Detail

Event Timeline

tejohnson created this revision.Oct 22 2018, 1:09 PM
pcc requested changes to this revision.Oct 22 2018, 1:42 PM

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

This revision now requires changes to proceed.Oct 22 2018, 1:42 PM
In D53524#1271357, @pcc wrote:

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

Is there an important use case for support thing mixing and matching? The issue is that it comes at a cost to all ThinLTO compiles for codes with vtables by requiring them all to process IR during the thin link. Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

In D53524#1271357, @pcc wrote:

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

Is there an important use case for support thing mixing and matching? The issue is that it comes at a cost to all ThinLTO compiles for codes with vtables by requiring them all to process IR during the thin link.

Ping on the question of why this mode needs to be default. If it was a matter of a few percent overhead that would be one thing, but we're talking a *huge* overhead (as noted off-patch for my app I'm seeing >20x thin link time currently, and with improvements to the hashing to always get successful splitting we could potentially get down to closer to 2x - still a big overhead). This kind of overhead should be opt-in. The average ThinLTO user is not going to realize they are paying a big overhead because CFI is always pre-enabled.

Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the module would have been split but wasn't. Users who get the error and want to support always-enabled CFI could opt in via -flto-unit.

pcc added a comment.Oct 25 2018, 10:16 AM
In D53524#1271357, @pcc wrote:

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

Is there an important use case for support thing mixing and matching? The issue is that it comes at a cost to all ThinLTO compiles for codes with vtables by requiring them all to process IR during the thin link.

Ping on the question of why this mode needs to be default. If it was a matter of a few percent overhead that would be one thing, but we're talking a *huge* overhead (as noted off-patch for my app I'm seeing >20x thin link time currently, and with improvements to the hashing to always get successful splitting we could potentially get down to closer to 2x - still a big overhead). This kind of overhead should be opt-in. The average ThinLTO user is not going to realize they are paying a big overhead because CFI is always pre-enabled.

Well, the intent was always that the overhead would be minimal, which is why things are set up the way that they are. But it doesn't sound like anyone is going to have the time to fully address the performance problems that you've seen any time soon, so maybe it would be fine to introduce the -flto-unit flag. I guess we can always change the flag so that it has no effect if/when the performance problem is addressed.

Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the module would have been split but wasn't. Users who get the error and want to support always-enabled CFI could opt in via -flto-unit.

Yes. I don't think we should make a change like this unless there is something like that in place, though. The documentation (LTOVisibility.rst) needs to be updated too.

In D53524#1276038, @pcc wrote:
In D53524#1271357, @pcc wrote:

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

Is there an important use case for support thing mixing and matching? The issue is that it comes at a cost to all ThinLTO compiles for codes with vtables by requiring them all to process IR during the thin link.

Ping on the question of why this mode needs to be default. If it was a matter of a few percent overhead that would be one thing, but we're talking a *huge* overhead (as noted off-patch for my app I'm seeing >20x thin link time currently, and with improvements to the hashing to always get successful splitting we could potentially get down to closer to 2x - still a big overhead). This kind of overhead should be opt-in. The average ThinLTO user is not going to realize they are paying a big overhead because CFI is always pre-enabled.

Well, the intent was always that the overhead would be minimal, which is why things are set up the way that they are. But it doesn't sound like anyone is going to have the time to fully address the performance problems that you've seen any time soon, so maybe it would be fine to introduce the -flto-unit flag. I guess we can always change the flag so that it has no effect if/when the performance problem is addressed.

Just to clarify, since there is already a -flto-unit flag: it is currently a cc1 flag, did you want it made into a driver option as well?

Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the module would have been split but wasn't. Users who get the error and want to support always-enabled CFI could opt in via -flto-unit.

Yes. I don't think we should make a change like this unless there is something like that in place, though. The documentation (LTOVisibility.rst) needs to be updated too.

Ok, let me work on that now and we can get that in before this one.

pcc added a comment.Oct 29 2018, 1:16 PM
In D53524#1276038, @pcc wrote:
In D53524#1271357, @pcc wrote:

The reason why LTO unit is always enabled is so that you can link translation units compiled with -fsanitize=cfi and/or -fwhole-program-vtables against translation units compiled without CFI/WPD. With this change we will see miscompiles in the translation units compiled with CFI/WPD if they use vtables in the translation units compiled without CFI/WPD. If we really need this option I think it should be an opt out.

Is there an important use case for support thing mixing and matching? The issue is that it comes at a cost to all ThinLTO compiles for codes with vtables by requiring them all to process IR during the thin link.

Ping on the question of why this mode needs to be default. If it was a matter of a few percent overhead that would be one thing, but we're talking a *huge* overhead (as noted off-patch for my app I'm seeing >20x thin link time currently, and with improvements to the hashing to always get successful splitting we could potentially get down to closer to 2x - still a big overhead). This kind of overhead should be opt-in. The average ThinLTO user is not going to realize they are paying a big overhead because CFI is always pre-enabled.

Well, the intent was always that the overhead would be minimal, which is why things are set up the way that they are. But it doesn't sound like anyone is going to have the time to fully address the performance problems that you've seen any time soon, so maybe it would be fine to introduce the -flto-unit flag. I guess we can always change the flag so that it has no effect if/when the performance problem is addressed.

Just to clarify, since there is already a -flto-unit flag: it is currently a cc1 flag, did you want it made into a driver option as well?

Yes, that's what I had in mind.

Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the module would have been split but wasn't. Users who get the error and want to support always-enabled CFI could opt in via -flto-unit.

Yes. I don't think we should make a change like this unless there is something like that in place, though. The documentation (LTOVisibility.rst) needs to be updated too.

Ok, let me work on that now and we can get that in before this one.

In D53524#1276038, @pcc wrote:

Can we detect that TUs compiled with -flto-unit are being mixed with those not built without -flto-unit at the thin link time and issue an error?

This would be doable pretty easily. E.g. add a flag at the index level that the module would have been split but wasn't. Users who get the error and want to support always-enabled CFI could opt in via -flto-unit.

Yes. I don't think we should make a change like this unless there is something like that in place, though. The documentation (LTOVisibility.rst) needs to be updated too.

Ok, let me work on that now and we can get that in before this one.

Mailed D53890 for this part.

tejohnson updated this revision to Diff 172181.Nov 1 2018, 11:06 AM

Address comments:
Promote -flto-unit to clang driver option (and test it)
Adjust LTOVisibility.rst to reflect change of default and new option.

pcc added inline comments.Nov 9 2018, 3:59 PM
docs/LTOVisibility.rst
9 ↗(On Diff #172181)

It's a little confusing to talk about "LTO units" as a property of a translation unit when there is only one LTO unit per linkage unit. I think this should say that an LTO unit is the subset of the linkage unit compiled with certain flags. Then in the rest of the document you can talk about translation units that are either part of or not part of the LTO unit.

tejohnson abandoned this revision.Nov 20 2018, 2:05 PM

Abandoned in favor of new approach in D53890/D53891.