This is an archive of the discontinued LLVM Phabricator instance.

Allow new comdat symbols to be added as a result of LTO.
Needs ReviewPublic

Authored by ruiu on Dec 21 2018, 11:07 AM.

Details

Summary

Previously, if you add two comdat symbols as a result of LTO,
they are reported as duplicate symbols because comdat symbols
are not uniquified after LTO.

This patch allows LTO to add comdat symbols. The DenseSet contains
comdat symbols we've seen so far. Instead of discarding the DenseSet
for each file reading, we use the same set to read all compiled
native object files.

I have a few questions about this patch though:

  • Is it assumed for LTO to add new comdat symbols as a result of LTO?
  • How can I test this patch?

Event Timeline

ruiu created this revision.Dec 21 2018, 11:07 AM
xur added a subscriber: xur.Dec 21 2018, 11:16 AM
pcc added a comment.Dec 21 2018, 11:26 AM

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

In D56015#1339411, @pcc wrote:

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

This patch I assume is due to an issue @xur had with a patch to enable context-sensitive pgo (instrumentation is added in the lto backends). He can give details on the profile related symbol being added.

xur added a comment.Dec 21 2018, 11:52 AM
In D56015#1339411, @pcc wrote:

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

pcc added a comment.Dec 21 2018, 12:36 PM

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

This is necessary so that all symbol resolution can be done in the linker before LTO, which as a general principle is what the resolution-based LTO API tries to do (IIRC when the gold plugin was changed to use the linker's symbol resolutions that ended up fixing a lot of bugs, and the same principle was used to design the new LTO API). Resolving comdats again after LTO wouldn't entirely solve the problem, I believe: I expect that you'd still hit a duplicate symbol error if the linker is passed bitcode files together with a regular object file with the PGO instrumentation information.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

Can you add the comdat in a pass that runs at compile time?

xur added a comment.Dec 21 2018, 1:33 PM
In D56015#1339541, @pcc wrote:

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

This is necessary so that all symbol resolution can be done in the linker before LTO, which as a general principle is what the resolution-based LTO API tries to do (IIRC when the gold plugin was changed to use the linker's symbol resolutions that ended up fixing a lot of bugs, and the same principle was used to design the new LTO API). Resolving comdats again after LTO wouldn't entirely solve the problem, I believe: I expect that you'd still hit a duplicate symbol error if the linker is passed bitcode files together with a regular object file with the PGO instrumentation information.

Is't the final link of the native objects will also do the resolution for the symbols. If the lto api keeps the comdat property of the symbols, they will be work fine when linking with regular object with the same comdat symbol.

In the absence of this patch, we also need final link to do the comdat resolution if we mixed bitcode with native object, right?

As you mentioned gold, gold works with this scheme. I've been using gold as the linker in my patch for a few months.

I think using a comdat variable is a convenient way to sync the actions b/w different compilation units.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

Can you add the comdat in a pass that runs at compile time?

The instrumentation transformation and creating of the this symbol are all in one pass. The structure will be weird if I split them (not to mention the handling of options)

xur added a comment.Dec 21 2018, 2:54 PM

pcc was right. The mixing of bitcode file with real object files does still have the duplicate symbol issue, even with this patch.

grimar added a subscriber: grimar.Dec 22 2018, 7:31 AM
xur added a comment.Jan 14 2019, 12:20 PM

An update: I now use a separated pass to create the comdat variables before LTO/ThinLTO. @ruir: You can discard this patch now. Thanks the help from ruir and pcc.