This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Prevent dead stripping and internalization of symbols with sections
ClosedPublic

Authored by tejohnson on Jul 19 2017, 1:28 PM.

Details

Summary

ELF linkers generate start_<secname> and stop_<secname> symbols
when there is a value in a section <secname> where the name is a valid
C identifier. If dead stripping determines that the values declared
in section <secname> are dead, and we then internalize (and delete)
such a symbol, programs that reference the corresponding start and end
section symbols will get undefined reference linking errors.

To fix this, add a flag to the IRSymtab to track when a symbol is
defined in an ELF object and has a C identifier section name. Then
use this in LTO to mark the symbol as external and visible from outside
the summary.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Jul 19 2017, 1:28 PM
davide added a subscriber: davide.Jul 19 2017, 1:35 PM

Do you know whether LTO gets this case right?

Do you know whether LTO gets this case right?

It does not get it correct. I could move the fix into the internalize handling under llvm::internalizeModule: that would cover ThinLTO with the new LTO API, and ThinLTO and regular LTO with the legacy LTO API. However, the regular LTO handling in LTO::runRegularLTO does not use internalizeModule. I would either need to duplicate the fix for regular LTO into both the new and legacy LTO handling, or move the fix into internalizeModule and refactor runRegularLTO to use it as well. WDYT?

Do you know whether LTO gets this case right?

It does not get it correct. I could move the fix into the internalize handling under llvm::internalizeModule: that would cover ThinLTO with the new LTO API, and ThinLTO and regular LTO with the legacy LTO API. However, the regular LTO handling in LTO::runRegularLTO does not use internalizeModule. I would either need to duplicate the fix for regular LTO into both the new and legacy LTO handling, or move the fix into internalizeModule and refactor runRegularLTO to use it as well. WDYT?

I'm not sure. This is actually exposing a bigger problem that we need to solve sooner rather than later (i.e. the fact there are two APIs)
Probably we can duplicate the fix for the time being but we should also have a discussion about it.

pcc edited edge metadata.Jul 19 2017, 2:44 PM

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

That makes sense to me. This kind of goes against the principles of the API as we're going to duplicate this at least between lld and gold, but I don't think it's unreasonable.
In particular, this seems ELF-specific so other formats might not really care, therefore the client seems to be the most obvious choice.

Do you know whether LTO gets this case right?

It does not get it correct. I could move the fix into the internalize handling under llvm::internalizeModule: that would cover ThinLTO with the new LTO API, and ThinLTO and regular LTO with the legacy LTO API. However, the regular LTO handling in LTO::runRegularLTO does not use internalizeModule. I would either need to duplicate the fix for regular LTO into both the new and legacy LTO handling, or move the fix into internalizeModule and refactor runRegularLTO to use it as well. WDYT?

I'm not sure. This is actually exposing a bigger problem that we need to solve sooner rather than later (i.e. the fact there are two APIs)
Probably we can duplicate the fix for the time being but we should also have a discussion about it.

Ironically, it is not the different APIs that are an issue - it is the different internalization handling for regular vs Thin LTO in the new API (the former updates the linkage directly, the latter invokes internalizeModule), that would provoke duplication (unless I refactored the former to also use internalizeModule).

In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

pcc added a comment.Jul 19 2017, 7:19 PM
In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

We'd need to add that information to the irsymtab and expose it via InputFile::Symbol.

In D35639#815579, @pcc wrote:
In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

We'd need to add that information to the irsymtab and expose it via InputFile::Symbol.

That's what I was afraid of. =( That seems quite a bit more extensive than simply checking whether the GV has a section and blocking internalization.
If we were to go the route you are suggesting, it looks like currently we don't iterate through the InputFile until we invoke the LTO API, when adding a module. I guess we would have to then iterate through those and update the global resolutions earlier, e.g. in the gold-plugin? I.e. in order to make it ELF specific.

pcc added a comment.Jul 19 2017, 8:02 PM
In D35639#815579, @pcc wrote:
In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

We'd need to add that information to the irsymtab and expose it via InputFile::Symbol.

That's what I was afraid of. =( That seems quite a bit more extensive than simply checking whether the GV has a section and blocking internalization.
If we were to go the route you are suggesting, it looks like currently we don't iterate through the InputFile until we invoke the LTO API, when adding a module. I guess we would have to then iterate through those and update the global resolutions earlier, e.g. in the gold-plugin? I.e. in order to make it ELF specific.

I don't follow your point about global resolutions, this should be purely a local analysis (from the linker's perspective at least). In both the gold plugin and lld an InputFile is available at the point where we compute the resolution for each symbol, so it should be a matter of copying the property from the InputFile symbol to the resolution.

In D35639#815591, @pcc wrote:
In D35639#815579, @pcc wrote:
In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

We'd need to add that information to the irsymtab and expose it via InputFile::Symbol.

That's what I was afraid of. =( That seems quite a bit more extensive than simply checking whether the GV has a section and blocking internalization.
If we were to go the route you are suggesting, it looks like currently we don't iterate through the InputFile until we invoke the LTO API, when adding a module. I guess we would have to then iterate through those and update the global resolutions earlier, e.g. in the gold-plugin? I.e. in order to make it ELF specific.

I don't follow your point about global resolutions, this should be purely a local analysis (from the linker's perspective at least). In both the gold plugin and lld an InputFile is available at the point where we compute the resolution for each symbol, so it should be a matter of copying the property from the InputFile symbol to the resolution.

Yeah I meant the SymbolResolution not the GlobalResolution.

In D35639#815591, @pcc wrote:
In D35639#815579, @pcc wrote:
In D35639#815234, @pcc wrote:

Another possibility is to handle this in the client (i.e. the linker) instead. The client has the best idea of whether a given section is a GC root; ELF linkers can set VisibleToRegularObj on any symbol in a section named like a C identifier, and other linkers can do whatever is appropriate for their object format.

I don't see how to get the presence of a section or the section name for a symbol when walking the symbols? I.e. either in gold-plugin via the symbol info from the linker, or in LTO::addModuleToGlobalRes via the InputFile:Symbol?

We'd need to add that information to the irsymtab and expose it via InputFile::Symbol.

That's what I was afraid of. =( That seems quite a bit more extensive than simply checking whether the GV has a section and blocking internalization.
If we were to go the route you are suggesting, it looks like currently we don't iterate through the InputFile until we invoke the LTO API, when adding a module. I guess we would have to then iterate through those and update the global resolutions earlier, e.g. in the gold-plugin? I.e. in order to make it ELF specific.

I don't follow your point about global resolutions, this should be purely a local analysis (from the linker's perspective at least). In both the gold plugin and lld an InputFile is available at the point where we compute the resolution for each symbol, so it should be a matter of copying the property from the InputFile symbol to the resolution.

Yeah I meant the SymbolResolution not the GlobalResolution.

Actually, I have a variant of this that I prefer. Instead of setting VisibleToRegularObj in each linker, we could handle it similar to the way some COFF specific behavior is handled. I.e. when building the symbol table, if isOSBinFormatELF, and the symbol has a C identifier section name, then set a (new) flag on the symbol indicating as such (e.g. ELFHasCIdentifierSectionName). Then in LTO::addModuleToGlobalRes, we add a check of this flag to the conditions under which VisibleOutsideSummary is set. That has the advantage of avoiding the need to add checking to each ELF linker, and also doesn't overload the VisibleToRegularObj flag with a condition that isn't being visible to a regular object, but rather visible outside the summary in a different way. Sound ok?

Actually, I have a variant of this that I prefer. Instead of setting VisibleToRegularObj in each linker, we could handle it similar to the way some COFF specific behavior is handled. I.e. when building the symbol table, if isOSBinFormatELF, and the symbol has a C identifier section name, then set a (new) flag on the symbol indicating as such (e.g. ELFHasCIdentifierSectionName). Then in LTO::addModuleToGlobalRes, we add a check of this flag to the conditions under which VisibleOutsideSummary is set. That has the advantage of avoiding the need to add checking to each ELF linker, and also doesn't overload the VisibleToRegularObj flag with a condition that isn't being visible to a regular object, but rather visible outside the summary in a different way. Sound ok?

Note in addModuleToGlobalRes it is also added to the set of conditions where we need to set the Partition to External (similar to the places where we currently check things like VisibleToRegularObj. I have this working for a small test case, testing with my full internal test case. If that works I will clean up and send for review. I like this since it is ELF specific but doesn't require any linker changes.

pcc added a comment.Jul 20 2017, 11:32 AM

Actually, I have a variant of this that I prefer. Instead of setting VisibleToRegularObj in each linker, we could handle it similar to the way some COFF specific behavior is handled. I.e. when building the symbol table, if isOSBinFormatELF, and the symbol has a C identifier section name, then set a (new) flag on the symbol indicating as such (e.g. ELFHasCIdentifierSectionName). Then in LTO::addModuleToGlobalRes, we add a check of this flag to the conditions under which VisibleOutsideSummary is set. That has the advantage of avoiding the need to add checking to each ELF linker, and also doesn't overload the VisibleToRegularObj flag with a condition that isn't being visible to a regular object, but rather visible outside the summary in a different way. Sound ok?

Note in addModuleToGlobalRes it is also added to the set of conditions where we need to set the Partition to External (similar to the places where we currently check things like VisibleToRegularObj. I have this working for a small test case, testing with my full internal test case. If that works I will clean up and send for review. I like this since it is ELF specific but doesn't require any linker changes.

Sounds fine to me.

Note that this design would preclude the linker from doing anything smarter with VisibleToRegularObj, like setting it depending on whether the input files contain an undefined reference to the __start_ or __stop_ symbols. We may also need section names in order to properly support certain linker script features with LTO. But I guess we can always change the design if any of that turns out to be necessary/useful.

tejohnson updated this revision to Diff 107579.Jul 20 2017, 1:48 PM

Address comments.

tejohnson retitled this revision from [ThinLTO] Prevent dead stripping and internalization of symbols with sections to [LTO] Prevent dead stripping and internalization of symbols with sections.Jul 20 2017, 2:27 PM
tejohnson edited the summary of this revision. (Show Details)
mehdi_amini added inline comments.Jul 20 2017, 2:46 PM
lib/LTO/LTO.cpp
391 ↗(On Diff #107579)

I don't understand why this is the right place to do this? Why isn't the responsibility of the linker resolution to mark these symbols as externally visible?

tejohnson added inline comments.Jul 20 2017, 2:53 PM
lib/LTO/LTO.cpp
391 ↗(On Diff #107579)

The start_ and end_ symbols for these sections are implicitly generated by the linker. At least in the gold plugin, we may never see the references to these special symbols (e.g. if they were in a regular object not even processed by the plugin). So the symbols that are declared in such sections (provoking the linker's generation of these symbols) need to be treated conservatively. Doing it here covers all ELF linkers.

mehdi_amini added inline comments.Jul 20 2017, 3:20 PM
lib/LTO/LTO.cpp
391 ↗(On Diff #107579)

But this is the reason we have a resolution API: we shouldn't have such "magic" here.
If these symbols must be preserved because the linker may add a reference to it later, the responsibility should be on the linker client of the API to pass the right resolution.

pcc added inline comments.Jul 20 2017, 3:49 PM
lib/LTO/LTO.cpp
391 ↗(On Diff #107579)

I agree with Mehdi, FWIW. I can see the argument for why one might choose to handle this in the API, but it's not the way that I would implement it.

mehdi_amini added inline comments.Jul 20 2017, 3:54 PM
lib/LTO/LTO.cpp
391 ↗(On Diff #107579)

I just thought about an analogy with another piece of magic: we prevent renaming of symbols that have a section name attached. This is also some "magic" (that I added) that isn't in the right place. However it can't be addressed by the "resolution" API, it would need a new construct for the linker / client of the API to express the fact that a symbol can't be renamed.

tejohnson updated this revision to Diff 107985.Jul 24 2017, 3:56 PM

Address review comments, moving handling to gold-plugin (I'd like to get
this one in asap and add the lld support in a follow up).

pcc added a comment.Jul 24 2017, 4:41 PM

s/__end_/__stop_/ in a bunch of places.

include/llvm/Object/IRSymtab.h
130 ↗(On Diff #107985)

I'd prefer to put the entire section name in here so that we can use it for the things I mentioned earlier.

Otherwise this can be moved to Symbol::Flags, there's plenty of space there.

test/tools/gold/X86/global_with_section.ll
10 ↗(On Diff #107985)

Is the "-u foo" part necessary? Same below.

45 ↗(On Diff #107985)

Probably also want a negative test that shows we don't do this to functions with a section name that isn't a C identifier.

tools/gold/gold-plugin.cpp
624 ↗(On Diff #107985)

Probably a little more precise to do assert(InputFileSyms.size() == F.syms.size()) outside of the loop instead.

671 ↗(On Diff #107985)

I'd move the section name check here outside the switch statement.

tools/llvm-lto2/llvm-lto2.cpp
360 ↗(On Diff #107985)

identifier

tejohnson marked 6 inline comments as done.Jul 25 2017, 9:11 AM
tejohnson added inline comments.
include/llvm/Object/IRSymtab.h
130 ↗(On Diff #107985)

Changed to track section name.

test/tools/gold/X86/global_with_section.ll
10 ↗(On Diff #107985)

Turns out no, so I removed foo. This was a carry over from when I was testing the LTO-only version of the fix, using internal tools that expected to have an exported symbol.

tejohnson updated this revision to Diff 108100.Jul 25 2017, 9:12 AM
tejohnson marked 2 inline comments as done.

Address review comments

pcc added inline comments.Jul 25 2017, 10:59 AM
lib/Object/IRSymtab.cpp
256 ↗(On Diff #108100)

I wouldn't check whether the section name is a C identifier here, as that is the client's responsibility. That probably means that the field should be renamed to just SectionName.

tejohnson marked an inline comment as done.Jul 25 2017, 11:30 AM

Address comments

pcc accepted this revision.Jul 25 2017, 12:07 PM

LGTM

This revision is now accepted and ready to land.Jul 25 2017, 12:07 PM
This revision was automatically updated to reflect the committed changes.