This is an archive of the discontinued LLVM Phabricator instance.

Allow new comdat symbols to be added as a result of LTO.
Needs ReviewPublic

Authored by ruiu on Dec 21 2018, 11:07 AM.

Download Raw Diff

Details

Reviewers

tejohnson
pcc
• espindola

Summary

Previously, if you add two comdat symbols as a result of LTO,
they are reported as duplicate symbols because comdat symbols
are not uniquified after LTO.

This patch allows LTO to add comdat symbols. The DenseSet contains
comdat symbols we've seen so far. Instead of discarding the DenseSet
for each file reading, we use the same set to read all compiled
native object files.

I have a few questions about this patch though:

Is it assumed for LTO to add new comdat symbols as a result of LTO?
How can I test this patch?

Diff Detail

Build Status

Buildable 26236
Build 26235: arc lint + arc unit

Event Timeline

ruiu created this revision.Dec 21 2018, 11:07 AM

Herald added a reviewer: • espindola. · View Herald TranscriptDec 21 2018, 11:07 AM

Herald added subscribers: dexonsmith, arichardson, inglorion and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B26236: Diff 179317.Dec 21 2018, 11:08 AM

MaskRay mentioned this in D56013: [ELF] Use SymbolTable::ComdatGroups instead of DummyGroups in LTO.Dec 21 2018, 11:09 AM

xur added a subscriber: xur.Dec 21 2018, 11:16 AM

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

In D56015#1339411, @pcc wrote:

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

This patch I assume is due to an issue @xur had with a patch to enable context-sensitive pgo (instrumentation is added in the lto backends). He can give details on the profile related symbol being added.

In D56015#1339411, @pcc wrote:

LTO optimization passes shouldn't be adding new comdats. Any comdats contained in a bitcode file should be explicit in its symbol table so that they get resolved during symbol resolution. So we should probably fix the bug in whichever pass is adding the comdats instead.

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

This is necessary so that all symbol resolution can be done in the linker before LTO, which as a general principle is what the resolution-based LTO API tries to do (IIRC when the gold plugin was changed to use the linker's symbol resolutions that ended up fixing a lot of bugs, and the same principle was used to design the new LTO API). Resolving comdats again after LTO wouldn't entirely solve the problem, I believe: I expect that you'd still hit a duplicate symbol error if the linker is passed bitcode files together with a regular object file with the PGO instrumentation information.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

Can you add the comdat in a pass that runs at compile time?

In D56015#1339541, @pcc wrote:

Why LTO optimization passes should not add new comdat variables?
I don't see a reason for this.

This is necessary so that all symbol resolution can be done in the linker before LTO, which as a general principle is what the resolution-based LTO API tries to do (IIRC when the gold plugin was changed to use the linker's symbol resolutions that ended up fixing a lot of bugs, and the same principle was used to design the new LTO API). Resolving comdats again after LTO wouldn't entirely solve the problem, I believe: I expect that you'd still hit a duplicate symbol error if the linker is passed bitcode files together with a regular object file with the PGO instrumentation information.

Is't the final link of the native objects will also do the resolution for the symbols. If the lto api keeps the comdat property of the symbols, they will be work fine when linking with regular object with the same comdat symbol.

In the absence of this patch, we also need final link to do the comdat resolution if we mixed bitcode with native object, right?

As you mentioned gold, gold works with this scheme. I've been using gold as the linker in my patch for a few months.

I think using a comdat variable is a convenient way to sync the actions b/w different compilation units.

For my use case, I create a comdat variable to encoce PGO instrumentation information (format, version etc) in each module.
I rely linker to pick one definition. This pass is now in lto backend.

Can you add the comdat in a pass that runs at compile time?

The instrumentation transformation and creating of the this symbol are all in one pass. The structure will be weird if I split them (not to mention the handling of options)

pcc was right. The mixing of bitcode file with real object files does still have the duplicate symbol issue, even with this patch.

grimar added a subscriber: grimar.Dec 22 2018, 7:31 AM

An update: I now use a separated pass to create the comdat variables before LTO/ThinLTO. @ruir: You can discard this patch now. Thanks the help from ruir and pcc.

pcc mentioned this in D56954: hwasan: Move memory access checks into small outlined functions on aarch64..Jan 22 2019, 4:44 PM

Revision Contents

Path

Size

lld/

ELF/

SymbolTable.cpp

4 lines

Diff 179317

lld/ELF/SymbolTable.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	template <class ELFT> void SymbolTable::addCombinedLTOObject() {
if (BitcodeFiles.empty())		if (BitcodeFiles.empty())
return;		return;

// Compile bitcode files and replace bitcode symbols.		// Compile bitcode files and replace bitcode symbols.
LTO.reset(new BitcodeCompiler);		LTO.reset(new BitcodeCompiler);
for (BitcodeFile *F : BitcodeFiles)		for (BitcodeFile *F : BitcodeFiles)
LTO->add(*F);		LTO->add(*F);

		DenseSet<CachedHashStringRef> Groups;
for (InputFile *File : LTO->compile()) {		for (InputFile *File : LTO->compile()) {
DenseSet<CachedHashStringRef> DummyGroups;
auto *Obj = cast<ObjFile<ELFT>>(File);		auto *Obj = cast<ObjFile<ELFT>>(File);
Obj->parse(DummyGroups);		Obj->parse(Groups);
for (Symbol *Sym : Obj->getGlobalSymbols())		for (Symbol *Sym : Obj->getGlobalSymbols())
Sym->parseSymbolVersion();		Sym->parseSymbolVersion();
ObjectFiles.push_back(File);		ObjectFiles.push_back(File);
}		}
}		}

// Set a flag for --trace-symbol so that we can print out a log message		// Set a flag for --trace-symbol so that we can print out a log message
// if a new symbol with the same name is inserted into the symbol table.		// if a new symbol with the same name is inserted into the symbol table.
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines