This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Record whether LTOUnit splitting is enabled in index
ClosedPublic

Authored by tejohnson on Oct 30 2018, 1:33 PM.

Details

Summary

Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:

  1. We are performing LowerTypeTests or Whole Program Devirtualization.
  2. There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Oct 30 2018, 1:33 PM
pcc added a comment.Nov 9 2018, 3:59 PM

without LTOUnit (for purely summary-based class hierarchy analyses).

We might want a different name for the flag then, because a translation unit that is subject to purely summary-based class hierarchy analysis is still part of the LTO unit, it just participates in fewer optimizations.

In D53890#1293818, @pcc wrote:

without LTOUnit (for purely summary-based class hierarchy analyses).

We might want a different name for the flag then, because a translation unit that is subject to purely summary-based class hierarchy analysis is still part of the LTO unit, it just participates in fewer optimizations.

I think I am confused then about the terminology. My understanding was that the LTO unit was the part that would be linked with regular LTO. Does a linkage unit's LTO unit include both the regular and thin LTO modules? If so, what name would you suggest?

pcc added a comment.Nov 13 2018, 11:55 AM
In D53890#1293818, @pcc wrote:

without LTOUnit (for purely summary-based class hierarchy analyses).

We might want a different name for the flag then, because a translation unit that is subject to purely summary-based class hierarchy analysis is still part of the LTO unit, it just participates in fewer optimizations.

I think I am confused then about the terminology. My understanding was that the LTO unit was the part that would be linked with regular LTO. Does a linkage unit's LTO unit include both the regular and thin LTO modules?

The LTO unit is a user-facing concept, and is (currently) defined as follows in LTOVisibility.rst:

a linkage ​unit's *LTO unit* is the subset of the linkage unit that is linked together ​using link-time optimization

The splitting of a translation unit into regular and thin LTO modules is an implementation detail that allows the LTO unit features described in that document to work. So I suppose that it would be accurate to say that both the regular and thin LTO modules are part of the LTO unit, but that's just a consequence of how things are implemented. Hope that helps.

If so, what name would you suggest?

On second thoughts, it does seem reasonable for the flag to be called LTOUnit. This reflects that the translation unit is part of the LTO unit.

To go over the cases, we will have:

  1. Regular LTO, part of the LTO unit
  2. ThinLTO, "traditional" WPD, part of the LTO unit
  3. ThinLTO, not part of the LTO unit

and eventually:

  1. ThinLTO, summary-based WPD, part of the LTO unit

So it seems like you will want to add another flag to writeThinLTOBitcode when you add case 4.

By the way, how are you planning to handle the case where a translation unit compiled for summary-based WPD is linked against a translation unit compiled for traditional WPD?

lib/Bitcode/Reader/BitcodeReader.cpp
5247 ↗(On Diff #171777)

I think we will need to do something about this flag for compatibility. Right now all translation units are part of the LTO unit. So that means that once this change goes in we will be unable to link old bitcode with new bitcode. We might need a way of distinguishing old bitcode from new bitcode so that we can ignore the LTO unit flag in old bitcode.

In D53890#1297383, @pcc wrote:
In D53890#1293818, @pcc wrote:

without LTOUnit (for purely summary-based class hierarchy analyses).

We might want a different name for the flag then, because a translation unit that is subject to purely summary-based class hierarchy analysis is still part of the LTO unit, it just participates in fewer optimizations.

I think I am confused then about the terminology. My understanding was that the LTO unit was the part that would be linked with regular LTO. Does a linkage unit's LTO unit include both the regular and thin LTO modules?

The LTO unit is a user-facing concept, and is (currently) defined as follows in LTOVisibility.rst:

a linkage ​unit's *LTO unit* is the subset of the linkage unit that is linked together ​using link-time optimization

The splitting of a translation unit into regular and thin LTO modules is an implementation detail that allows the LTO unit features described in that document to work. So I suppose that it would be accurate to say that both the regular and thin LTO modules are part of the LTO unit, but that's just a consequence of how things are implemented. Hope that helps.

Ok, I read that, but it wasn't clear to me what it meant in the context of ThinLTO and regular LTO split modules.

If so, what name would you suggest?

On second thoughts, it does seem reasonable for the flag to be called LTOUnit. This reflects that the translation unit is part of the LTO unit.

To go over the cases, we will have:

  1. Regular LTO, part of the LTO unit
  2. ThinLTO, "traditional" WPD, part of the LTO unit
  3. ThinLTO, not part of the LTO unit

and eventually:

  1. ThinLTO, summary-based WPD, part of the LTO unit

I am confused on the difference between 3 and 4. What makes that augmented summary that enables the WPD (i.e. 4) imply that it is now part of the LTO unit, whereas currently with ThinLTO -fno-lto-unit which implies other whole-program optimizations (i.e. 3) it is not? Reading through https://clang.llvm.org/docs/LTOVisibility.html again, I guess it is because with WPD we need to talk about LTO visibility, and that can only be done in the context of an LTO unit property. So simply doing class hierarchy optimizations such as WPD we require that the ThinLTO module be defined as being part of the LTO unit?

It sounds to me like changing the default of the -flto-unit flag might be wrong, and what we really want is a different flag that controls the actual splitting directly?

So it seems like you will want to add another flag to writeThinLTOBitcode when you add case 4.

By the way, how are you planning to handle the case where a translation unit compiled for summary-based WPD is linked against a translation unit compiled for traditional WPD?

I think that should be an error (which this patch was intended to catch, since I was thinking LTOUnit would be disabled for case 4). But as noted above, do we want a different flag to control the splitting, then record that as a flag instead?

pcc added a comment.Nov 13 2018, 4:18 PM
In D53890#1297383, @pcc wrote:
In D53890#1293818, @pcc wrote:

without LTOUnit (for purely summary-based class hierarchy analyses).

We might want a different name for the flag then, because a translation unit that is subject to purely summary-based class hierarchy analysis is still part of the LTO unit, it just participates in fewer optimizations.

I think I am confused then about the terminology. My understanding was that the LTO unit was the part that would be linked with regular LTO. Does a linkage unit's LTO unit include both the regular and thin LTO modules?

The LTO unit is a user-facing concept, and is (currently) defined as follows in LTOVisibility.rst:

a linkage ​unit's *LTO unit* is the subset of the linkage unit that is linked together ​using link-time optimization

The splitting of a translation unit into regular and thin LTO modules is an implementation detail that allows the LTO unit features described in that document to work. So I suppose that it would be accurate to say that both the regular and thin LTO modules are part of the LTO unit, but that's just a consequence of how things are implemented. Hope that helps.

Ok, I read that, but it wasn't clear to me what it meant in the context of ThinLTO and regular LTO split modules.

If so, what name would you suggest?

On second thoughts, it does seem reasonable for the flag to be called LTOUnit. This reflects that the translation unit is part of the LTO unit.

To go over the cases, we will have:

  1. Regular LTO, part of the LTO unit
  2. ThinLTO, "traditional" WPD, part of the LTO unit
  3. ThinLTO, not part of the LTO unit

and eventually:

  1. ThinLTO, summary-based WPD, part of the LTO unit

I am confused on the difference between 3 and 4. What makes that augmented summary that enables the WPD (i.e. 4) imply that it is now part of the LTO unit, whereas currently with ThinLTO -fno-lto-unit which implies other whole-program optimizations (i.e. 3) it is not? Reading through https://clang.llvm.org/docs/LTOVisibility.html again, I guess it is because with WPD we need to talk about LTO visibility, and that can only be done in the context of an LTO unit property. So simply doing class hierarchy optimizations such as WPD we require that the ThinLTO module be defined as being part of the LTO unit?

Yes. Basically when you're adding summary-based WPD you're adding another way in which the TU can be part of the LTO unit.

It sounds to me like changing the default of the -flto-unit flag might be wrong, and what we really want is a different flag that controls the actual splitting directly?

That sounds like it would be simpler. We could present a choice between "use summary-based WPD" and "use traditional WPD", where the default is to use summary-based WPD, and record that in a flag. Then we wouldn't need to change the definition of an LTO unit, and I think we would only need to track whether that flag is inconsistent.

In D53890#1297809, @pcc wrote:

It sounds to me like changing the default of the -flto-unit flag might be wrong, and what we really want is a different flag that controls the actual splitting directly?

That sounds like it would be simpler. We could present a choice between "use summary-based WPD" and "use traditional WPD", where the default is to use summary-based WPD, and record that in a flag. Then we wouldn't need to change the definition of an LTO unit, and I think we would only need to track whether that flag is inconsistent.

How about I then add optional argument to -fwhole-program-vtables. I.e. "=index" and "=ir" (not sure those are the best names, what do you suggest?), and by default -fwhole-program-vtables would be the same as "-fwhole-program-vtables=index". Although we'd probably want to stage the changes so that the initial default is "=ir", so that we can update internal uses of the option, then flip the default. Eventually, -flto=thin would imply -fwhole-program-vtables=index.

CodeGenOpts.LTOUnit stays the same (default will stay on, controls insertion of type metadata)
CodeGenOpts.WholeProgramVTables stays a bool (enabled if any flavor of -fwhole-program-vtables is enabled, controls insertion of type test etc intrinsics)
Add a new CodeGenOpts.SplitLTOUnit bool which would only be set if CFI is enabled or if -fwhole-program-vtables=ir is specified. It will control whether the module is split in the presence of type metadata.

We then also record whether the new SplitLTOUnit is set in the index to detect mismatches during the thin link.

lib/Bitcode/Reader/BitcodeReader.cpp
5247 ↗(On Diff #171777)

I think I can fix this by flipping the sense of the new flag - i.e. set the flag to true in the (soon to be default) non-split case but unset in the (currently default) SplitLTOUnit case.

tejohnson updated this revision to Diff 174328.Nov 15 2018, 9:33 PM
  • Switch to an EnableSplitLTOUnit flag.
  • Record the value of !EnableSplitLTOUnit in the bitcode for compatibility with current code that has splitting enabled by default
  • As discussed off-patch, will use a new, separate option to control this

splitting, here I am using -f[no]split-lto-unit.

  • I'll update the patch title and summary once we converge on flag/behavior.

Update a couple tests due to new default of splitting off and new flag.

tejohnson updated this revision to Diff 174834.Nov 20 2018, 1:51 PM

Changed to use a new module flag to represent whether splitting enabled.
This has a couple advantages:

  1. Cleaner - doesn't require threading the flag through various interfaces.
  2. For my follow on patch to add new summary info for index-based WPD,

need to know whether splitting enabled during buildModuleSummaryIndex,
and it is not clear how to pass down a flag when this is invoked via
ModuleSummaryIndexAnalysis::run* invoked by getResult<>/getAnalysis<>
from the BitcodeWriterPass. Adding the flag to the index after the index
is built (previous version) is too late.

tejohnson retitled this revision from [LTO] Record LTOUnit flag in index and validate during LTO link to [LTO] Record whether LTOUnit splitting is enabled in index.Nov 20 2018, 2:02 PM
tejohnson edited the summary of this revision. (Show Details)
ormris added a subscriber: ormris.Nov 20 2018, 2:08 PM
tejohnson updated this revision to Diff 176219.Nov 30 2018, 4:21 PM

Add an option to opt so that we can use the ThinLTOBitcodeWriter for
non-split ThinLTO bitcode, to enable testing the same path used when
compiling ThinLTO code via clang.

ping on this and related patches

pcc added a comment.Dec 18 2018, 9:56 PM

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split, but given that there's a workaround maybe that's not that critical.

lib/Bitcode/Reader/BitcodeReader.cpp
5934 ↗(On Diff #176219)

If we weren't emitting FS_FLAGS into module summaries before then can't we return true here and avoid needing to invert the flag in the bitcode?

tools/opt/opt.cpp
106 ↗(On Diff #176219)

Wouldn't it be better to fix the test cases to be explicit instead?

In D53890#1335770, @pcc wrote:

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split, but given that there's a workaround maybe that's not that critical.

I would have expected this to link successfully, but disable any optimizations that we don’t have enough information to perform safely.

tejohnson marked 2 inline comments as done.Dec 19 2018, 10:39 AM
In D53890#1335770, @pcc wrote:

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split,

In a distributed build context you'd also need to record something in the index to get the backend clang invocation to also split bar.o. I'm not sure it is worth doing all this.

but given that there's a workaround maybe that's not that critical.

I would have expected this to link successfully, but disable any optimizations that we don’t have enough information to perform safely.

From what I understand, this wouldn't just be disabling the LTO link time whole program optimizations, but also flagging it in the index and doing some lowering in the backend of type tests (@pcc is that correct?).

Is it unreasonable to expect that the user can simply add the flag indicated by the error? The downside of that though is that users may just add the flag to their build files and unnecessarily get splitting from there on out.

lib/Bitcode/Reader/BitcodeReader.cpp
5934 ↗(On Diff #176219)

Yes I think you are right - we only had FS_FLAGS on the combined indexes before, so that should work.

tools/opt/opt.cpp
106 ↗(On Diff #176219)

You mean change the default of this to be false and update tests? Yes, I can do that and you are right it probably is better as the default is changing in clang.

pcc added a comment.Dec 19 2018, 11:04 AM
In D53890#1335770, @pcc wrote:

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split,

In a distributed build context you'd also need to record something in the index to get the backend clang invocation to also split bar.o. I'm not sure it is worth doing all this.

but given that there's a workaround maybe that's not that critical.

I would have expected this to link successfully, but disable any optimizations that we don’t have enough information to perform safely.

Yes, that sounds a lot better than my suggestion.

From what I understand, this wouldn't just be disabling the LTO link time whole program optimizations, but also flagging it in the index and doing some lowering in the backend of type tests (@pcc is that correct?).

The way that I see this working is that if we see a mix of old and new bitcode (or a mix of split and non-split) we would need to set a flag in the index that would cause WholeProgramDevirt to be disabled, and would also cause LowerTypeTests to error out if there are any llvm.type.tests to be lowered. Given that the error behaviour would be restricted to users of -fsanitize=cfi (which is not a commonly used feature) and that you already need to split in order to use CFI, this seems quite reasonable to me.

Is it unreasonable to expect that the user can simply add the flag indicated by the error? The downside of that though is that users may just add the flag to their build files and unnecessarily get splitting from there on out.

What do you think about the behaviour that I've described above? It seems relatively straightforward to implement to me, but I might be missing something.

In D53890#1336551, @pcc wrote:
In D53890#1335770, @pcc wrote:

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split,

In a distributed build context you'd also need to record something in the index to get the backend clang invocation to also split bar.o. I'm not sure it is worth doing all this.

but given that there's a workaround maybe that's not that critical.

I would have expected this to link successfully, but disable any optimizations that we don’t have enough information to perform safely.

Yes, that sounds a lot better than my suggestion.

From what I understand, this wouldn't just be disabling the LTO link time whole program optimizations, but also flagging it in the index and doing some lowering in the backend of type tests (@pcc is that correct?).

The way that I see this working is that if we see a mix of old and new bitcode (or a mix of split and non-split) we would need to set a flag in the index that would cause WholeProgramDevirt to be disabled, and would also cause LowerTypeTests to error out if there are any llvm.type.tests to be lowered. Given that the error behaviour would be restricted to users of -fsanitize=cfi (which is not a commonly used feature) and that you already need to split in order to use CFI, this seems quite reasonable to me.

Right - I forgot that we shouldn't have any type tests to lower in this case, so it becomes pretty straightforward. Since CFI causes splitting to continue to be enabled (in D53891), we shouldn't have any issues.

Is it unreasonable to expect that the user can simply add the flag indicated by the error? The downside of that though is that users may just add the flag to their build files and unnecessarily get splitting from there on out.

What do you think about the behaviour that I've described above? It seems relatively straightforward to implement to me, but I might be missing something.

SGTM, thanks

tejohnson marked 2 inline comments as done.Dec 21 2018, 9:43 AM
In D53890#1336551, @pcc wrote:
In D53890#1335770, @pcc wrote:

This means that

clang -c -flto=thin foo.c
[upgrade clang]
clang -c -flto=thin bar.c
clang foo.o bar.o

will fail unless you add -fsplit-lto-unit to the second invocation. Given that the alternative is that we potentially silently miscompile, maybe that's fine. It would be possible to handle this case without aborting or miscompiling by splitting bar.o inside the linker and treating it as if it had always been split,

In a distributed build context you'd also need to record something in the index to get the backend clang invocation to also split bar.o. I'm not sure it is worth doing all this.

but given that there's a workaround maybe that's not that critical.

I would have expected this to link successfully, but disable any optimizations that we don’t have enough information to perform safely.

Yes, that sounds a lot better than my suggestion.

From what I understand, this wouldn't just be disabling the LTO link time whole program optimizations, but also flagging it in the index and doing some lowering in the backend of type tests (@pcc is that correct?).

The way that I see this working is that if we see a mix of old and new bitcode (or a mix of split and non-split) we would need to set a flag in the index that would cause WholeProgramDevirt to be disabled, and would also cause LowerTypeTests to error out if there are any llvm.type.tests to be lowered. Given that the error behaviour would be restricted to users of -fsanitize=cfi (which is not a commonly used feature) and that you already need to split in order to use CFI, this seems quite reasonable to me.

Right - I forgot that we shouldn't have any type tests to lower in this case, so it becomes pretty straightforward. Since CFI causes splitting to continue to be enabled (in D53891), we shouldn't have any issues.

Is it unreasonable to expect that the user can simply add the flag indicated by the error? The downside of that though is that users may just add the flag to their build files and unnecessarily get splitting from there on out.

What do you think about the behaviour that I've described above? It seems relatively straightforward to implement to me, but I might be missing something.

SGTM, thanks

Ended up emitting an error in either WPD or lower type tests when only some modules split and we have type tests (or type checked loads in WPD).

tejohnson updated this revision to Diff 179304.Dec 21 2018, 9:56 AM

Implement changes discussed in review. Also required lots of test updates (for inverted 'opt' option default and inverted flag meaning). Added test for new error.

ping - @pcc do you have any more comments or can this be committed?

pcc accepted this revision.Jan 10 2019, 4:32 PM

LGTM

lib/Transforms/IPO/LowerTypeTests.cpp
1705 ↗(On Diff #179304)

I'd say "if only some of the modules were split". Same in WholeProgramDevirt.

lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
421 ↗(On Diff #179304)

This comment isn't accurate anymore. I'd probably remove it since the code speaks for itself.

442 ↗(On Diff #179304)

Likewise

This revision is now accepted and ready to land.Jan 10 2019, 4:32 PM
tejohnson marked 5 inline comments as done.

Address comments.

tejohnson added inline comments.Jan 11 2019, 10:34 AM
lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
421 ↗(On Diff #179304)

I updated the comment and left it, think it is useful to have a header comment.

442 ↗(On Diff #179304)

ditto

tejohnson edited the summary of this revision. (Show Details)Jan 11 2019, 10:35 AM
This revision was automatically updated to reflect the committed changes.